Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Version 3.5 #168

Merged
merged 173 commits into from
Jun 8, 2019
Merged

Version 3.5 #168

merged 173 commits into from
Jun 8, 2019

Conversation

clhunsen
Copy link
Collaborator

@clhunsen clhunsen commented Jun 7, 2019

3.5

This is the PR for releasing the version 3.5 of coronet. Thank you very much for all contributions, props to all contributors!

Announcement

Added

  • Add the constants UNTRACKED.FILE, UNTRACKED.FILE.EMPTY.ARTIFACT, and UNTRACKED.FILE.EMPTY.ARTIFACT.TYPE: Commits that do not change any artifact are considered to be carried out on a meta-file called <untracked.file>. The constant UNTRACKED.FILE is added to hold the string constant. Analogously, the constants UNTRACKED.FILE.EMPTY.ARTIFACT (currently, "") and UNTRACKED.FILE.EMPTY.ARTIFACT.TYPE (currently, "") hold the constants for any artifacts and their corresponding types, respectively, "changed" in untracked files. (11428d9, 5ea65b9, dde0dd7, 2284bbe)
  • Add the public method ProjectData$get.commits.filtered.uncached: The method allows for external filtering of the commits by specifying if untracked files and/or the base artifact should be filtered (this method does not take advantage of caching, whereas the method ProjectData$get.commits.filtered does) (11428d9)
  • Add the parameters commits.filter.base.artifact and commits.filter.untracked.files to the ProjectConf: In addition to the ProjectConf parameter commits.filter.base.artifact (previously called artifact.filter.base), which configured whether the base artifact should be included in the get.commits.filtered method, there is now a similar parameter called commits.filter.untracked.files doing the same thing for untracked files (11428d9, 466d8eb)
  • Add parameter edges.for.base.artifacts to NetworkConf : In author networks, edges do not get constructed anymore between authors for solely modifying untracked files. For authors involved in changing the base artifact, it can be configured whether edges should be created or not using the new NetworkConf parameter edges.for.base.artifacts (c60c2f6, 466d8eb)
  • Add method ProjectData$get.authors.by.data.source to retrieve authors by given data-source name (Change commit filtering and network building regarding the untracked files and base artifact #149, 6580427, 137d833)
  • Add helper function create.empty.data.frame: The function returns empty data.frames (0 rows) with correct columns and, if specified, all the correct data types. In the future, functions, that return data in data.frames, should always return data.frames of the same shape (regarding columns and data types) – especially when they are empty – because this makes later case distinctions easier or unnecessary (67a4fbe, 3513647)
  • For the most common types of data.frames (data.frames of commits, mails, issues, and authors) four more utility methods are added, namely create.empty.authors.list, create.empty.commits.list, create.empty.issues.list, create.empty.mails.list, create.empty.synchronicity.list, create.empty.pasta.list as well as corresponding constants holding columns and associated data types for all these empty data.frames (5f0f529, 523daef, f8e021d, 3513647, 2f4e6f0, cd3e34a)
  • Add mandatory attributes in create.empty.network if wanted (cae9d4b, cc8bd86)
  • Add function create.empty.vertex.list (c00101d)
  • Add tests for construction of networks without data (a4b3524)
  • Add tests for construction of networks without vertices (6eb214c)
  • Add a note on mailing-list threads to README (c6dca27)
  • Add cutting functionality to README descriptions (fb40c50)
  • Add the parameter restrict.classification.to.authors to the functions get.author.class.by.type, get.author.class.overview, get.author.class.network.degree, get.author.class.network.eigen, get.author.class.network.hierarchy, get.author.class.commit.count and get.author.class.loc.count. The parameter allows to perform classifications on a limited group of authors whose names are specified in this parameter. (2492dd0, Optimizations for network-based core-peripheral classification #148)
  • Add test cases for util-core-peripheral.R by adding the new file test-core-peripheral.R along with test cases (2627d6c)
  • Add project-configuration parameter issues.from.source to choose if only issues from JIRA, only issues from GitHub, or all issues shall be read in (PR Add possibility to choose issue source #159, d677949, a3e7132, ea26181). Therefore two test cases, one that reads in only JIRA issues and one that reads in only GitHub issues, are added to the issue read test (65b1acd, 2d897cb)
  • Add class documentation (Improvement of the documentation conventions #157, 6e33d0a, 250f9e0)

Changed/Improved

  • Always add mandatory vertex and edge attributes (Mandatory vertex and edge attributes #154, 0526755)
  • Heavily improve addition of PaStA data (cd3e34a)
  • The method read.issues in util-read.R now supports the new issue data format (PR Adjust network library to the new issue data format #147, 77c750c, e04ce30, 67b818a, 4020487, 3513647). Therefore, the test issue data and all related tests are updated (39971ee, 0ec6c6c, 6a9f4ad, fda000f, 3513647)
  • Rename ProjectConf parameter artifact.filter.base to commits.filter.base.artifact (PR Change commit filtering and network building regarding the untracked files and base artifact #149, 466d8eb)
  • The constant BASE.ARTIFACTS is extended by adding untracked files (i.e. the new meta-file UNTRACKED.FILE), which is now considered to be a new base artifact in the case of file-level analyses. This implies, that, in case of file-level analyses, the base artifact and the untracked files fall together, while in feature-level and function-level analyses they are treated differently (d11d0fb)
  • Filtering by artifact kind (e.g. filtering out either "Feature" or "FeatureExpression") is now being done in the method ProjectData$get.commits instead of the method ProjectData$get.commits.filtered (894c9a5)
  • Remove get.commits.filtered.empty and corresponding filter.commits.empty method, the functionality is now included into the methods get.commits.filtered and filter.commits respectively (11428d9)
  • The private method ProjectData$filter.commits now takes parameters which configure whether untracked files and/or the base artifact are to be filtered (11428d9)
  • Remove get.commits.raw, set.commits.raw and read.commits.raw functions (64a9486, c26e582)
  • Add commits on untracked files to test suite (Remove empty artifact as vertex (PR #149) #153, d9f527c)
  • In the class Conf (and its sub-classes NetworkConf and ProjectConf), default parameters are not validated anymore to avoid confusion by logging output (ec8c6dd)
  • In the class Conf (and its sub-classes NetworkConf and ProjectConf), stop is called on errors during parameter updates now (ec8c6dd)
  • Change shape of Vertices in the legend of plots to avoid confusion (f4fb480)
  • Refactor ProjectData$get.cached.data.sources to be more concise (a4e7a21)
  • Update contribution guide regarding roxygen2 conventions (Improvement of the documentation conventions #157, fbc2d54, 783ee58, 6e33d0a)
  • Update README regarding mandatory edge attributes (641624b)
  • Rename misleading parameter names for functions get.author.class.by.type, get.author.class.overview, get.author.class.network.degree, get.author.class.network.eigen, get.author.class.network.hierarchy, get.author.class.commit.count and get.author.class.loc.count. Most importantly, the parameter range.data was renamed to proj.data for these functions. (587ef99, 81568b1, Update core-peripheral module #70)
  • Remove the unused functions get.commit.count.threshold and get.loc.count.threshold. (2534d73, Update core-peripheral module #70)
  • The function verify.argument.for.parameter was adjusted to be suitable in more general use-cases (557bdcd)
  • Do not redundantly initialize data sources when splitting (35698a1)
  • Read PaStA and synchronicity data only if enabled (79bf3ca)
  • Add and enforce coding convention to use 'vertices' and not 'nodes'. Most importantly, the function metrics.node.degrees is renamed to metrics.vertex.degrees. (d35ce61)
  • Adjust range directories' names to start with a consecutive range number and to conform with the directories created by Codeface (b3e2947, f6b28fb)
  • Remove the two functions get.author.class.activity and get.author.class.activity.overview from the file util-core-peripheral.R (61b344a)
  • Remove function get.commit.data from util-data.R and replace all calls to this function with statements of equivalent functionality despite the fact that they are now retrieving the commit data via get.commits.filtered instead of get.commits which was internally used in the function get.commit.data (Update core-peripheral module #70, 4fc6b45, 7fc454e, c4cf8d2)
  • Add possibility to decide whether the vertex attribute active.ranges should be computed per activity type or over all activity types (Further vertex attributes #92, aba8af9, 1bb81e8, 8f35a6b)
  • During the computation of the vertex attribute first.activity, the default value is now used analogous to active-ranges computation: The given value is used as default per author and type. (Further vertex attributes #92, 18a065c, edf864a)

Fixed

bockthom and others added 30 commits October 25, 2018 11:59
The shape of the legend for 'Vertices' was the same as the shape of the
'Author' vertex type. This is a little bit confusing when we have two
different vertex types and vertices in the network:
Then the shape of 'Vertices' in the legend is the same as for the
vertex type 'Author' even if the vertex type is 'Artifact'.

To avoid confusion, use another shape in the legend for 'Vertices'
which is (usually) not used for vertex types. Then it is more clear that
not the shape of 'Vertices' in the legend does matter, but the color.

Signed-off-by: Thomas Bock <[email protected]>
Signed-off-by: Thomas Bock <[email protected]>
Update the exemplary multi network, which is displayed in the README.md, to
contain the shape changes in the legend
(see f4fb480).

In addition, add width and height parameters to the ggsave statement in the
showcase.R file which generates this examplary multi network.

Signed-off-by: Thomas Bock <[email protected]>
Change shape of `Vertices` in the legend of plots to avoid confusion

Reviewed-by: Claus Hunsen <[email protected]>
When calling the method 'ProjectData$reset.environment()', an error is
produced:
> Error in private$artifacts = NULL :
>   cannot add bindings to a locked environment

This is due a regression introduced in commit
1bed431, where the field
'ProjectData$artifacts' has been removed, but not the corresponding
statement for resetting it. This is fixed now.

Signed-off-by: Claus Hunsen <[email protected]>
Signed-off-by: Claus Hunsen <[email protected]>
Quick fix:  Fix error when resetting an ProjectData environment

Reviewed-by: Thomas Bock <[email protected]>
The get.commits.raw function was removed. Instead, the function get.commits
should be used from now on.

Signed-off-by: Jakob Kronawitter <[email protected]>
The artifact kind filtering which filters the commits.list file and only keeps
the commits which have the correct artifact.type (configured in the ProjectConf
class) has been moved to the get.commits method of the ProjectData class.
Previously this functionality was in the get.commits.filtered method.

Signed-off-by: Jakob Kronawitter <[email protected]>
In the case of a valid commits.list file with at least one commit line the
read.commits function returns a data.frame with 16 columns containing all the
commits read from the file. If the commits.list file is empty, however, it
previously returned an empty data.frame with no columns. This has now been
adjusted to return an empty data.frame with all the columns (16 columns), which
should save a lot of additional if-else case distinctions later on because now
the shape of the returned data.frame by the read.commits function is always the
same.

Signed-off-by: Jakob Kronawitter <[email protected]>
This major commit merges the two old methods get.commits.filtered and
get.commits.filtered.empty of the ProjectData class into one new method again
called get.commits.filtered. Similiarly, the filter.commits.empty and
filter.commits methods were merged into one new filter.commits method which now
takes filter.untracked.files and artifact.filter.base as paramaters which then
control how the filtering is performed.

The filter.untracked.files parameter was added to the ProjectConf which now
controls - just like the artifact.filter.base parameter - which commits should
be filtered out when calling the get.commits.filtered method.

If you want to call the get.commits.filtered with other paramaters (not the ones
that are configured in the ProjectConf) then one can call the
get.commits.filtered.uncached version of this method. As the name implies, this
method is not taking advantage of caching and should thus not be used too often.

In the course of revamping these methods it only took a minor effort to rename
the empty artifact to a more speaking identifier, namely, "untracked files".
Thus, this renaming was also performed in this commit.

Signed-off-by: Jakob Kronawitter <[email protected]>
The new get.commits method includes filtering by artifact kind. Two testcases
depended on this and thus have now been adjusted accordingly. 10 test cases of
the test-split.R are still not working.

Signed-off-by: Jakob Kronawitter <[email protected]>
The test cases were adapted to two of the new changes in the network library.
The first one is the fact that the get.commits method now removes either
'Feature' or 'FeatureExpression' commits. The second one was the change that
there are no dummy data.frames anymore (with zero columns and rows). Instead
there are empty data.frames when there no data exists (with columns but zero
rows). One mistake was made during creation of these. The empty data.frames
that are created did not contain any data type informtion (all columns defaulted
to the 'logical' data.type). If this is not wanted there now exists a new helper
method which also takes care of data types.

Signed-off-by: Jakob Kronawitter <[email protected]>
Previously, when an author network was created and the untracked files artifact
and the base artifact were included, edges have been created among the untracked
files artifact and among the base artifact. This was now changed so that there
are no edges created among untracked files at any time. For the base artifact it
can be configured via the new base.artifact.edges parameter in the NetworkConf.

Signed-off-by: Jakob Kronawitter <[email protected]>
Signed-off-by: Jakob Kronawitter <[email protected]>
The global constant 'UNTRACKED.FILE' is added to avoid reusage of the same
string 'untracked.file' all the time. In addition minor adjustments are made to
the documentation.

Signed-off-by: Jakob Kronawitter <[email protected]>
In recent scenarios and in perspective of up-coming changes, the default
behavior of 'Conf' objects upon initialization and update:

1) The default values are *not* automatically checked against the
allowed values anymore. This is mainly disabled to avoid confusion of
users. The constructor of the class 'Conf' is adapted accordingly.
2) When updating a configuration value, the program execution is now
stopped (using 'stop') on failure. Previously, the respective update has
been ignored while issuing a warning. This change helps preventing
confusion and analysis errors early in an analysis script. Accordingly,
the parameter 'stop.on.error' to all update methods is removed.

Furthermore, the code is streamlined, such that the super-constructor is
called from both subclasses 'NetworkConf' and 'ProjectConf'. Some log
statements are added/adjusted, too.

Signed-off-by: Claus Hunsen <[email protected]>
When a network contains no edges but more than one node, all the nodes get
combined. To fix this, the respecting data frame, which contains the nodes,
has to be transposed.

This fixes #150.

Reported-by: Jakob Kronawitter <[email protected]>
Signed-off-by: Thomas Bock <[email protected]>
The edge creation process which does not draw any edges among authors of
untracked files and - if configured in the 'ProjectConf' - does also not draw
any edges among the base artifact authors is being reworked since the old way
of achieving this was rather uninituitive and complicated.

Signed-off-by: Jakob Kronawitter <[email protected]>
For commits to untracked files the artifact column has previously been the
copied file column (for example when looking at the commit data returned by
'get.commits'). However this should only be the case when considering file level
analysis (e.g. 'artifact == file' in the 'ProjectConf').
This commit changes this to the correct behaviour. So for 'artifact == function'
and 'artifact == feature' the artifact column now only contains the empty string
for untracked files. To avoid hardcoding this empty string in every affected
place a global constant called 'UNTRACKED.FILE.EMPTY.ARTIFACT' was added.

Signed-off-by: Jakob Kronawitter <[email protected]>
In a previous commit the constant 'UNTRACKED.FILE' was removed from the
'BASE.ARTIFACTS' constant due to temporary difficulties with this assignment.
This change is now reverted.

Signed-off-by: Jakob Kronawitter <[email protected]>
This commit changes an inline comment which was misleadingly talking
about committers but actually meant commit authors.

Signed-off-by: Jakob Kronawitter <[email protected]>
This commit renames the following three configuration options:
- 'artifact.filter.base' to 'commits.filter.base.artifact',
- 'filter.untracked.files' to 'commits.filter.untracked.files'
- 'base.artifact.edges' to 'edges.for.base.artifacts'.

Also the documentation gets slightly adjusted in one place because the old
documentation contained outdated information.

Signed-off-by: Jakob Kronawitter <[email protected]>
When constructing a network in 'construct.network.from.edge.list',
several corner cases need to be handled. When there are no edges
available, an empty edge list can be created using
'create.empty.edge.list'. This way, reliably, the function
'igraph::graph.data.frame' can be used to construct a network. This
further improves the patch 0d7c222,
which tackles #150.

Tests for creating edgeless networks are added to the file
'tests/test-networks.R'. This likely prevents regressions in the future.

Additionally, use the function 'create.empty.edge.list' in one further
place where possible.

Signed-off-by: Claus Hunsen <[email protected]>
Signed-off-by: Jakob Kronawitter <[email protected]>
This patch consists of three related fix and adaptations:

First, the method 'ProjectData$get.authors.by.data.source' does not
correct the column names of the returned data.frame anymore. This
establishes compatibility with the method 'ProjectData$get.authors'.
Additionally, the returned data.frame only contains unique entries. The
documentation is tidied.

Second, the method 'NetworkBuilder$get.author.network.cochange' is fixed
by adding the missing 'private$' prefix when accessing the project data.

Third, the assignment of author vertices is corrected to use only author
names with the correct vertex attribute (i.e., "name"). This adapts the
code with respect to the first change mentioned above.

This change fixes all failing tests in PR #149.

Signed-off-by: Claus Hunsen <[email protected]>
Signed-off-by: Jakob Kronawitter <[email protected]>
Signed-off-by: Jakob Kronawitter <[email protected]>
Klara and others added 26 commits May 27, 2019 12:08
Signed-off-by: Klara Schlueter <[email protected]>
Add spaces between "if" and "(", add documentation for default values,
clarify example for finding minima in first activity computation.

Signed-off-by: Klara Schlueter <[email protected]>
Apply documentation conventions and give input-output example
for list.by.inner.level.

Signed-off-by: Klara Schlueter <[email protected]>
..and remove unused parameter from helper function.

Signed-off-by: Klara Schlueter <[email protected]>
Signed-off-by: Klara Schlueter <[email protected]>
Base active ranges computation on multiple data sources and adapt first activity

Reviewed-by: Claus Hunsen <[email protected]>
Reviewed-by: Thomas Bock <[email protected]>
This fixes #10. Yeah, the oldest open issue will be closed! :)

To guide users through the renaming of submodules, an additional note is
added to the README.

Signed-off-by: Claus Hunsen <[email protected]>
This fixes #157.

Additionally, add a note on proper setting of comments.

Signed-off-by: Claus Hunsen <[email protected]>
To fulfill the R coding conventions, class documentation for the
following classes is added:
- 'Conf',
- 'ProjectConf',
- 'NetworkConf',
- 'ProjectData',
- 'RangeData', and
- 'NetworkBuilder'.

Signed-off-by: Claus Hunsen <[email protected]>
For more consistency and coherence, the definitions and functions in the
file 'util-read.R' are re-ordered to give rise to the sections 'Main
data sources' and 'Additional data sources'. Each section contains
subsections with the corresponding functions and constants for the
single data sources.

Signed-off-by: Claus Hunsen <[email protected]>
Fix crash behaviour of function 'get.author.class' which occurred whenever a
zero-row dataframe was passed in the parameter 'author.data.frame'.

The fix is realized without a parameter check but instead with two slight
manipulations to the existing code.
 - The expression '1:author.class.threshold.idx' is replaced with
   'seq_len(author.class.threshold.idx)' to always produce a integer vector of
   length 'author.class.threshold.idx', specifically in the case of
   'author.class.threshold.idx' being zero which occurrs when a zero-row
   dataframe is passed through the parameter 'author.data.frame'.
 - The function 'sapply' shows a strange behaviour whenever an empty vector is
   passed as the first argument (in this case 'author.cumsum'). It always
   returns vectors having the same length as the first argument, however, when
   the first argument is a vector of length zero, it returns an empty list
   instead of an empty vector. Therefore, an 'as.logical' statement is
   added to ensure that there is always a (logical) vector being returned.

The two above mentioned changes allow the function to handle zero-row dataframes
being passed correctly without additional parameter checks.

In addition, a call to 'suppressWarnings' is used to hide the warning that was
output when the function 'min' gets called on an empty vector. The warning
informed about 'min' returning an infinity value since no minimum value could be
found in the empty vector, however, this special case is handled in the
following 'if' statement anyway, so there is no need to show this warning to the
user.

This fixes #164.

Signed-off-by: Jakob Kronawitter <[email protected]>
To ensure that there are no regressions in the future, the case that an
empty data.frame is given to the function 'get.author.class' needs to be
incorporated in the test suite.

This relates to issue #164.

Signed-off-by: Claus Hunsen <[email protected]>
Given the corner case that an empty network is given to the function
'add.vertex.attribute.*' or none of the vertices in the network is
assigned a value, the vertex attribute is now added manually and by
force using 'add.attributes.to.network'. The default value is then
assigned as defined by the immediate call to 'add.vertex.attribute.*'.

Two test cases are added:
- addition of vertex attributes to empty networks, and
- addition of vertex attributes to non-empty networks, but with usage of
  the default value (adaptation of first new test case).

Additionally, to foster readability inside the function
'add.vertex.attribute', the local variable 'attr.df' is renamed to
'attrs.by.vertex.name' – since it is no data.frame.

This fixes #165.

Signed-off-by: Claus Hunsen <[email protected]>
Signed-off-by: Thomas Bock <[email protected]>
For the case that a data.frame with less than two columns is given to
the function 'get.author.class', the input data is reset to coincide
with the specification given in the function documentation.

This is related to #164 and 8060caa.

Additionally, a short improvement to the documentation of
'get.author.class', as the column denoted by 'calc.base.name' does not
necessarily need to be the second column.

Signed-off-by: Claus Hunsen <[email protected]>
Signed-off-by: Thomas Bock <[email protected]>
Signed-off-by: Claus Hunsen <[email protected]>
Signed-off-by: Claus Hunsen <[email protected]>
To convey our goal and the acronym 'coronet' stands for, a short
explanation on the library name is added to the file 'README.md'.

This relates to #10, 929f8ce, and has
been suggested by @bockthom.

Signed-off-by: Claus Hunsen <[email protected]>
Signed-off-by: Claus Hunsen <[email protected]>
…edges

When a network or network range contains only one vertex and no edges,
the classification into core and peripheral developers now classifies
the one author as core.

[Claus: Add second test, apply coding conventions, and do small
refactorings. Adjust commit title.]

Signed-off-by: Christian Hechtl <[email protected]>
Signed-off-by: Claus Hunsen <[email protected]>
Signed-off-by: Claus Hunsen <[email protected]>
Project renaming and minor fixes

Reviewed-by: Thomas Bock <[email protected]>
For consistency, the .Rproj file that is distributed with the repository
is renamed to reflect the upcoming name change of the repository.

Signed-off-by: Claus Hunsen <[email protected]>
Signed-off-by: Claus Hunsen <[email protected]>
@clhunsen clhunsen added this to the v3.5 milestone Jun 7, 2019
@clhunsen
Copy link
Collaborator Author

clhunsen commented Jun 7, 2019

As everything has been reviewed already, I will merge as soon as the TravisCI tests pass.

@clhunsen clhunsen merged commit d02d523 into master Jun 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants