Version 4.0 #212

bockthom · 2021-09-01T13:38:52Z

4.0

Announcement

coronet now has a logo and a website: https://se-sic.github.io/coronet (Logo #167, PR Logo & Website #196)

Added

Add functionality to read and process commit messages in order to merge them to the commit data (see issue Read in and add commit messages #180). Three values are available for the new attribute commit.messages in ProjectConf: none, title and messages (PR Add commit message merge functionality #193, 85b1d05, fdc414a, 43e1894)
Add functions cleanup.commit.message.data and cleanup.synchronicity.data to remove commit hashes that are not any more present in the commit data from the commit message data or synchronicity data (PR Add commit message merge functionality #193, 98e83b0)
Add function metrics.is.smallworld to the metrics module in order to unify checks for smallworldness (similar to scalefreeness) (PR Fixes in the metrics module #195, ce1f812)
Add function metrics.vertex.centralities to metrics module in order to simplify getting a data frame containing author names and their respective centrality values (d3cd528, e7182e7)
Add function get.data.sources.from.relations to util-networks.R which extracts the data sources of a network that were used when building it (PR Fixes in the metrics module #195, d1e4413)
Add tests for the get.data.sources.from.relations function (PR Fixes in the metrics module #195, add0c74)
Add logo directory containing several logo variants (PR Logo & Website #196, 82f9971, dc4659e, fdc5e67, 752a9b3)
Add function preprocess.issue.data, which implements common issue data filtering operations. (fcf5cee, a566cae, 5ba6feb)
Add function get.issues.uncached, which gets the issues filtered without poisoning or using the cache. (eb919fa)
Add function get.issues.unfiltered to get the unfiltered issues so that these methods follow the naming scheme known from the respective methods for commits (b9dd94c, e05f344)
Add per-author vertex attributes regarding counting of issues, issue-creations, issue-comments, mails, mail-threads, ... (like mail thread count, issue creation count) (PR Mail count & Issue count #194, issue Add further vertex attributes for author vertices #188, 9f9150a, 7260d62, 139f70b, eb4f649, 627873c, 1e1406f, 98e11ab, a566cae)
Add functionality that allows to read any data source at any point in time, even after splitting. In this case, the read data is automatically cut to the corresponding range on the RangeData object (PR Enable to read data at every point in time #201, 7f9394f). Additionally, when changing the configuration parameters concerning additional data sources, the environment of a ProjectData object is no longer reset (PR Enable to read data at every point in time #201, eed45ac)
Add new configuration parameters commits.locked, mails.locked and issues.locked to ProjectConf which, when set to TRUE, prevent the respective getters from triggering the read of the data if it is not present yet (PR Enable to read data at every point in time #201, 3821677)
Add support for classifying developers on the basis of more count-based classification metrics, including mail-count, mail-thread count, issue-count, issue-comment count, issue-commented-in count, and issue-created count (issue Update core-peripheral module #70, PR Add new threshold calculation for network-based classifications and other small fixes #209, d7b2455, 6f737c8)
Add bot filtering mechanism, which allows removing issues/mails/commits made by bots (838855f, dcce82d)
Ignore the "deleted user", as well as the author having an empty name "" (1a08140, 24c222a)

Changed/Improved

Breaking Change: Rename getters for main data sources: Unfiltered date is now acquired using get.<datasource>.unfiltered, filtered data is acquired using get.<datasource> (edf19cf, e05f344)
Add check for empty network in metrics.hub.degree function. In the case of an empty network, a warning is being printed and NA is returned (PR Fixes in the metrics module #195, 4b164be)
Adjust the function ProjectData$get.artifacts: Rename its single parameter to data.sources and change the function so that it can extract the artifacts for multiple data sources at once. The default is still that only artifacts from the commit data are extracted. (PR Fixes in the metrics module #195, cf795f2, 70c05ec, 5a46ff4, fd767bb)
Change the internal representation of empty data from NULL to empty data frames and adapt function get.cached.data.sources() of ProjectData which returns a vector of all data sources that are cached (including additional and filtered data sources) (PR Enable to read data at every point in time #201, aec898e, e55d088, 24c222a); additionally, introduce new function is.data.source.cached() in util-data.R that returns a logical vector indicating which of the given data sources are cached (PR Enable to read data at every point in time #201, b49cc5d, 491e70c, 24c222a)
Change the threshold calculation for the classification of developers to use a quantile approach when classifying on the basis of network centrality metrics (issue Apply core threshold correctly using 'quantile' #205, PR Add new threshold calculation for network-based classifications and other small fixes #209, PR Fix bug in author classification #210, 5128252, 0d6a3a1)
Update documentation in util-network-metrics.R and util.conf.R (PR Fixes in the metrics module #195, f929248, de9988c, PR Fix wrong data path issue and emerging bugs #199, 059b286)
Splitting no longer loads all (additional) data sources, but only the ones that have already been cached in the ProjectData (PR Enable to read data at every point in time #201, 52a3014, aec898e, de1bbfe)
Improve the documentation in util-core-peripheral.R by adding roxygen skeleton documentation to undocumented functions (issue Update core-peripheral module #70, PR Add new threshold calculation for network-based classifications and other small fixes #209, a3d5ca7, 6f737c8)
Change the $ notation to the bracket notation in util-core-peripehral.R (issue Update core-peripheral module #70, PR Add new threshold calculation for network-based classifications and other small fixes #209, 6f737c8)
Add .drone.yml to enable running our CI pipelines on drone.io (PR Set up CI pipeline for drone.io #191, 1c5804b)
Not only run test suite in our CI pipeline, but also run the showcase file in our CI pipeline using test data (719a4f0, 3eb31d8)
Add R version 4.1 to test suite and adjust missing time-zone attributes on NA vectors or empty POSIXct vectors which are correctly added as of R version 4.1 (PR Necessary adjustments for the recently released R version 4.1 #203, 6b7fb36, 98c5671, 09d11ab)

Fixed

Fix fencing issue timing data so that issue events "happen" after the issue was created. Since only commit_added events are affected, that only happens for these. (issue When to do timestamp extraction for issue data? Before or after event filtering? #185, 627873c, 6ff585d)
Fix the function reset.environment() of both the ProjectData and NetworkBuilder class; they now reset all the data (PR Fix wrong data path issue and emerging bugs #199, de091a5, fc4c086)
Adjust the functions update.commit.message.data(), update.pasta.data(), and update.synchronicity.data(): no warning is being printed anymore when being called by the corresponding cleanup function (PR Fix wrong data path issue and emerging bugs #199, e5c60a5)
Fix issue where the data path on RangeData objects was wrong in special cases. Introduce the (private) flag built.from.range.data.read that is set according to how the object has been created (splitting manually or reading codeface ranges) and calculating the data path accordingly (PR Fix wrong data path issue and emerging bugs #199, cce9527, 917bf64, 169c034). Also add tests for this new behaviour (PR Fix wrong data path issue and emerging bugs #199, ef5bac6, 3aa8e7d, d454e5a, 66ad127)
Make splitting no longer modify the original ProjectConf, instead create a copy (e82d056)
Fix and update outdated examples in the showcase file (473c094, 287fbfa, 0a5cce4, PR Update showcase.R and add minor bug fixes #207)
Fix generation of Codeface range directory names from commit hashes (5c90d1c)
Fix plotting an empty network via plot.network (03f986d)
Fix behavior of construct.ranges when only one range has to bee constructed and sliding.window = TRUE (000314b)
Add package reshape2 to the install script as this package is used in module util-plot-evolution.R for quite a while but never has been added to the list of packages to install (7bb4e7b)
Fix data tests in test-data.R to use deep clones of ProjectData objects (PR Add new threshold calculation for network-based classifications and other small fixes #209, d75373a)
Fix the update.values() function in util-conf.R to delete the value field if the new value is equal to the default value as the comparison of two otherwise equal Conf objects fails without this (PR Add new threshold calculation for network-based classifications and other small fixes #209, d75373a)

Get the commit messsage data using the new read function and merge either nothing, the title or message and title into the commit.data of the proj.conf instance. See #180 Signed-off-by: Niklas Schneider <[email protected]>

Add the new attribute "commit.messages" to the project configuration class with options "none", "title" and "message" to make it possible to specify what exactly of the commit message data is to be merged to the commit data. See #180 Signed-off-by: Niklas Schneider <[email protected]>

Signed-off-by: Niklas Schneider <[email protected]>

Add two tests for testing the merge functionality for both full commit messages and titles only. Fix bug that merges message body instead of title when selecting option "title" See #180 Signed-off-by: Niklas Schneider <[email protected]>

Signed-off-by: Niklas Schneider <[email protected]>

Also exchange the merge attribute when merging data frames of commit messages from commit.id to hash. Signed-off-by: Niklas Schneider <[email protected]>

Signed-off-by: Niklas Schneider <[email protected]>

As commit.id was the first column of the data frame anyway, merging has not changed the order. But when using the hash column it is taken as the first colum of the resulting data frame. Change the order of the columns in order to not break anything that relies on the order. See #180 Signed-off-by: Niklas Schneider <[email protected]>

@clhunsen

Follow the review suggestions of @clhunsen. See #180 Signed-off-by: Niklas Schneider <[email protected]>

Following the review of #193 Signed-off-by: Niklas Schneider <[email protected]>

Remove some empty lines and indent some lines. Also remove commit.message.data.unprocessed variable and use the commit.message.data variable from the beginning. Add column names beforehand in order to enable access without indices. See #193 Signed-off-by: Niklas Schneider <[email protected]>

Create private function update.commit.message.data in util-data.R which handles the merge and change the location where it is called in set.commits. See #193 Signed-off-by: Niklas Schneider <[email protected]>

Fix an error where the value of a variable that is defined in an if block is returned outside that if block. See #193 Signed-off-by: Niklas Schneider <[email protected]>

Replaced a loop with a conversion from a list of vectors in a data frame and access its columns directly See #193 Signed-off-by: Niklas Schneider <[email protected]>

Move functions concerning reading commit messages and the constants used by them to a new section in util.read. Replace subset with proper indexing and minor comment fixes. See #193 Signed-off-by: Niklas Schneider <[email protected]>

Also adapt 'update.commit.messages' to better match the implementation of similar methods. Add 'set.commit.messages' in order to be able to set the commit messages to NULL. See #193. Signed-off-by: Niklas Schneider <[email protected]>

Introduce new function 'format.commit.ids' in along with new section in util-read.R. Also put format "<commit-%s>" into a constant. See #193 Signed-off-by: Niklas Schneider <[email protected]>

@clhunsen

Take advice by @clhunsen to replace if else cascade for rearranging columns with better merge call. Also modify test-data tests regarding commit messages: Row names are no longer ignored. See #193 Signed-off-by: Niklas Schneider <[email protected]>

Change order in 'README.md', 'util-conf.R' and 'util-data.R' Also fix table of contents in the readme. See #193 Signed-off-by: Niklas Schneider <[email protected]>

See #193 Signed-off-by: Niklas Schneider <[email protected]>

Add the package in 'install.R' and a description in the 'README.md'. Also rearrange the parameter descriptions of 'ProjectConf' to be sorted alphabetically. See #193 Signed-off-by: Niklas Schneider <[email protected]>

Use the new data.table package to replace do.call with data.table::rbindlist which is faster in processing data.frames. See #193 Signed-off-by: Niklas Schneider <[email protected]>

Signed-off-by: Niklas Schneider <[email protected]>

See #193 Signed-off-by: Niklas Schneider <[email protected]>

Remove hardcoded string formatting and replace it in tests for creating expected data using the new function 'format.commit.ids'. See #193 Signed-off-by: Niklas Schneider <[email protected]>

@clhunsen

Follow @clhunsen's advice to create commit message data with an lapply to avoid having a for-loop and an additional lapply call afterwards See #193 Signed-off-by: Niklas Schneider <[email protected]>

Add check for the ProjectConf attribute 'commit.messages' before calling 'update.commit.messages'. Also fix a few errors in comments as well as one if condition where the wrong attribute was checked. See #193 Signed-off-by: Niklas Schneider <[email protected]>

Add the getter call to the 'additional.data' list. See #193 Signed-off-by: Niklas Schneider <[email protected]>

See #193 Signed-off-by: Niklas Schneider <[email protected]>

Add (empty) commit message data to all data split tests in 'tests-split.R'. Also sor the additional data sources alphabetically in the tests. See #193 Signed-off-by: Niklas Schneider <[email protected]>

Signed-off-by: Christian Hechtl <[email protected]>

Add new threshold calculation for network-based classifications and other small fixes Reviewed-by: Thomas Bock <[email protected]>

…tering functionality Signed-off-by: Johannes Hostert <[email protected]>

Signed-off-by: Johannes Hostert <[email protected]>

…tests Signed-off-by: Johannes Hostert <[email protected]>

…source>.filtered to get.<datasource>, for commits/mails/issues. Remove reflective method invocations. Signed-off-by: Johannes Hostert <[email protected]>

Signed-off-by: Johannes Hostert <[email protected]>

…<data> and <data>.unfiltered Signed-off-by: Johannes Hostert <[email protected]>

Signed-off-by: Johannes Hostert <[email protected]>

Also order the edge attributes when changing them to make a identical check possible on two differently ordered lists. Signed-off-by: Christian Hechtl <[email protected]> Committed-by: Johannes Hostert <[email protected]>

Signed-off-by: Johannes Hostert <[email protected]>

Read bot data and filter bots Reviewed-by: Thomas Bock <[email protected]> Reviewed-by: Christian Hechtl <[email protected]>

With #209 we changed the threshold calculation for network-based classifications. But the use of the new threshold was still the old one. So now change the classification using the new threshold so that all authors with a centrality value greater than the threshold are considered core. This is documented in #205. Signed-off-by: Christian Hechtl <[email protected]>

Signed-off-by: Christian Hechtl <[email protected]>

Fix bug in author classification Reviewed-by: Thomas Bock <[email protected]>

Signed-off-by: Thomas Bock <[email protected]>

Signed-off-by: Christian Hechtl <[email protected]>

Fix inconsistencies in log statements Reviewed-by: Thomas Bock <[email protected]>

Signed-off-by: Thomas Bock <[email protected]>

bockthom · 2021-09-01T14:08:27Z

We are ready for version 4.0 of coronet now. Thanks to all contributors for your additions, improvements, and fixes @nlschn @JoJoDeveloping @hechtlC and also for our now logo @ChristianKaltenecker.

As everything has already been reviewed, I will merge right away.

nlschn added 30 commits January 7, 2021 21:44

Add functions that enable merging commit messages into data

fdc414a

Get the commit messsage data using the new read function and merge either nothing, the title or message and title into the commit.data of the proj.conf instance. See #180 Signed-off-by: Niklas Schneider <[email protected]>

Replace seq with seq_along and add missing log statement in util-read.R

f80b24b

Signed-off-by: Niklas Schneider <[email protected]>

Add description of changes to unversioned section of NEWS.md

359b12c

Signed-off-by: Niklas Schneider <[email protected]>

Remove unnecessary empty lines from several files

70c8395

Also exchange the merge attribute when merging data frames of commit messages from commit.id to hash. Signed-off-by: Niklas Schneider <[email protected]>

Fix a syntax error in util-read

89a6ea6

Signed-off-by: Niklas Schneider <[email protected]>

Modify README and NEWS

c9c7ff7

Follow the review suggestions of @clhunsen. See #180 Signed-off-by: Niklas Schneider <[email protected]>

Rename "message.body" column to "message" everywhere

0457dd5

Following the review of #193 Signed-off-by: Niklas Schneider <[email protected]>

Put merge functionality into own function

8e28a1f

Create private function update.commit.message.data in util-data.R which handles the merge and change the location where it is called in set.commits. See #193 Signed-off-by: Niklas Schneider <[email protected]>

Fix error when returning a variable that is not defined

703ab3e

Fix an error where the value of a variable that is defined in an if block is returned outside that if block. See #193 Signed-off-by: Niklas Schneider <[email protected]>

Simplify data frame creation in read.commit.messages

7caaa8d

Replaced a loop with a conversion from a list of vectors in a data frame and access its columns directly See #193 Signed-off-by: Niklas Schneider <[email protected]>

Fix comments in and change order in 'set.commits'

eb1cec8

Also adapt 'update.commit.messages' to better match the implementation of similar methods. Add 'set.commit.messages' in order to be able to set the commit messages to NULL. See #193. Signed-off-by: Niklas Schneider <[email protected]>

Add helper function to format 'commit.id' column

d5c8c78

Introduce new function 'format.commit.ids' in along with new section in util-read.R. Also put format "<commit-%s>" into a constant. See #193 Signed-off-by: Niklas Schneider <[email protected]>

Change commit message merge process

43e1894

Take advice by @clhunsen to replace if else cascade for rearranging columns with better merge call. Also modify test-data tests regarding commit messages: Row names are no longer ignored. See #193 Signed-off-by: Niklas Schneider <[email protected]>

Change order of data sources to be alphabetical

70b3cb6

Change order in 'README.md', 'util-conf.R' and 'util-data.R' Also fix table of contents in the readme. See #193 Signed-off-by: Niklas Schneider <[email protected]>

Update 'NEWS.md' with commit hashes

31e0f85

See #193 Signed-off-by: Niklas Schneider <[email protected]>

Add package 'data.table' to coronet and refactor README

a0d5e32

Add the package in 'install.R' and a description in the 'README.md'. Also rearrange the parameter descriptions of 'ProjectConf' to be sorted alphabetically. See #193 Signed-off-by: Niklas Schneider <[email protected]>

Increase perfomance of commit message read

4c49269

Use the new data.table package to replace do.call with data.table::rbindlist which is faster in processing data.frames. See #193 Signed-off-by: Niklas Schneider <[email protected]>

Update my copyright notices

19655dd

Signed-off-by: Niklas Schneider <[email protected]>

Fix spelling errors in 'README.md' and 'util-conf.R'

a36bde4

See #193 Signed-off-by: Niklas Schneider <[email protected]>

Use new helper function in tests to format commit ids

aab0751

Remove hardcoded string formatting and replace it in tests for creating expected data using the new function 'format.commit.ids'. See #193 Signed-off-by: Niklas Schneider <[email protected]>

Replace for-loop with lapply call in function to read commit messages

0859b9a

Follow @clhunsen's advice to create commit message data with an lapply to avoid having a for-loop and an additional lapply call afterwards See #193 Signed-off-by: Niklas Schneider <[email protected]>

Initialize commit message data on RangeData-objects in 'util-split.R'

686459e

Add the getter call to the 'additional.data' list. See #193 Signed-off-by: Niklas Schneider <[email protected]>

Fix minor spelling errors

613a773

See #193 Signed-off-by: Niklas Schneider <[email protected]>

Change all data split tests to include commit message data

98e83b0

Add (empty) commit message data to all data split tests in 'tests-split.R'. Also sor the additional data sources alphabetically in the tests. See #193 Signed-off-by: Niklas Schneider <[email protected]>

hechtlC and others added 26 commits July 30, 2021 17:14

Add entries in the changelog for the pull request

1ec501e

Signed-off-by: Christian Hechtl <[email protected]>

Merge pull request #209 from hechtlC/master

99fdb01

Add new threshold calculation for network-based classifications and other small fixes Reviewed-by: Thomas Bock <[email protected]>

First steps in bot filtering: Add read method, add test file, add fil…

838855f

…tering functionality Signed-off-by: Johannes Hostert <[email protected]>

Preliminary draft of bot filtering

dcce82d

Signed-off-by: Johannes Hostert <[email protected]>

Ignore the deleted user when reading in issues/PRs

1a08140

Signed-off-by: Johannes Hostert <[email protected]>

Make everything use get.mails.filtered() instead of get.mails(); fix …

edf19cf

…tests Signed-off-by: Johannes Hostert <[email protected]>

Rename get.<datasource> to get.<datasource>.unfiltered, and get.<data…

e05f344

…source>.filtered to get.<datasource>, for commits/mails/issues. Remove reflective method invocations. Signed-off-by: Johannes Hostert <[email protected]>

Fix Bug in util-split.R

345433e

Signed-off-by: Johannes Hostert <[email protected]>

Rename data fields in util-data.R from <data>.filtered and <data> to …

78158ce

…<data> and <data>.unfiltered Signed-off-by: Johannes Hostert <[email protected]>

Address reviews

8fff2d3

Signed-off-by: Johannes Hostert <[email protected]>

Add test for bot reading

84b922a

Signed-off-by: Johannes Hostert <[email protected]>

Address reviews, next round: update NEWS.md, fix inconsistencies

24c222a

Signed-off-by: Johannes Hostert <[email protected]>

Make last commit appear in NEWS.md, fix remaining bugs

c97b3a5

Signed-off-by: Johannes Hostert <[email protected]>

Address reviews, round 4: Small fixes for consistency

25ee63d

Signed-off-by: Johannes Hostert <[email protected]>

Fix bug that causes warning message for conf updates

87eaed3

Also order the edge attributes when changing them to make a identical check possible on two differently ordered lists. Signed-off-by: Christian Hechtl <[email protected]> Committed-by: Johannes Hostert <[email protected]>

Review, round 5: very small fix

1c97ec5

Signed-off-by: Johannes Hostert <[email protected]>

Merge pull request #206 from JoJoDeveloping/filter-bots

afb0ec0

Read bot data and filter bots Reviewed-by: Thomas Bock <[email protected]> Reviewed-by: Christian Hechtl <[email protected]>

Update NEWS.md

95efe90

Signed-off-by: Christian Hechtl <[email protected]>

Merge pull request #210 from hechtlC/master

b17a1e0

Fix bug in author classification Reviewed-by: Thomas Bock <[email protected]>

Update and sort parts of the README.md

564208b

Signed-off-by: Thomas Bock <[email protected]>

Update README and CI pipeline regarding R-versions

335fc9a

Signed-off-by: Thomas Bock <[email protected]>

Fix inconsistencies in log statements

8ca83f0

Signed-off-by: Christian Hechtl <[email protected]>

Merge pull request #211 from hechtlC/master

17f1e60

Fix inconsistencies in log statements Reviewed-by: Thomas Bock <[email protected]>

Streamline changelog prior to next release

51a82fd

Signed-off-by: Thomas Bock <[email protected]>

Version 4.0

bb8cd05

Signed-off-by: Thomas Bock <[email protected]>

bockthom added the versioning label Sep 1, 2021

bockthom added this to the v4.0 milestone Sep 1, 2021

bockthom merged commit a656026 into master Sep 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Version 4.0 #212

Version 4.0 #212

bockthom commented Sep 1, 2021

bockthom commented Sep 1, 2021

Version 4.0 #212

Version 4.0 #212

Conversation

bockthom commented Sep 1, 2021

4.0

Announcement

Added

Changed/Improved

Fixed

bockthom commented Sep 1, 2021