-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change commit filtering and network building regarding the untracked files and base artifact #149
Conversation
The get.commits.raw function was removed. Instead, the function get.commits should be used from now on. Signed-off-by: Jakob Kronawitter <[email protected]>
The artifact kind filtering which filters the commits.list file and only keeps the commits which have the correct artifact.type (configured in the ProjectConf class) has been moved to the get.commits method of the ProjectData class. Previously this functionality was in the get.commits.filtered method. Signed-off-by: Jakob Kronawitter <[email protected]>
In the case of a valid commits.list file with at least one commit line the read.commits function returns a data.frame with 16 columns containing all the commits read from the file. If the commits.list file is empty, however, it previously returned an empty data.frame with no columns. This has now been adjusted to return an empty data.frame with all the columns (16 columns), which should save a lot of additional if-else case distinctions later on because now the shape of the returned data.frame by the read.commits function is always the same. Signed-off-by: Jakob Kronawitter <[email protected]>
This major commit merges the two old methods get.commits.filtered and get.commits.filtered.empty of the ProjectData class into one new method again called get.commits.filtered. Similiarly, the filter.commits.empty and filter.commits methods were merged into one new filter.commits method which now takes filter.untracked.files and artifact.filter.base as paramaters which then control how the filtering is performed. The filter.untracked.files parameter was added to the ProjectConf which now controls - just like the artifact.filter.base parameter - which commits should be filtered out when calling the get.commits.filtered method. If you want to call the get.commits.filtered with other paramaters (not the ones that are configured in the ProjectConf) then one can call the get.commits.filtered.uncached version of this method. As the name implies, this method is not taking advantage of caching and should thus not be used too often. In the course of revamping these methods it only took a minor effort to rename the empty artifact to a more speaking identifier, namely, "untracked files". Thus, this renaming was also performed in this commit. Signed-off-by: Jakob Kronawitter <[email protected]>
Signed-off-by: Jakob Kronawitter <[email protected]>
The new get.commits method includes filtering by artifact kind. Two testcases depended on this and thus have now been adjusted accordingly. 10 test cases of the test-split.R are still not working. Signed-off-by: Jakob Kronawitter <[email protected]>
The test cases were adapted to two of the new changes in the network library. The first one is the fact that the get.commits method now removes either 'Feature' or 'FeatureExpression' commits. The second one was the change that there are no dummy data.frames anymore (with zero columns and rows). Instead there are empty data.frames when there no data exists (with columns but zero rows). One mistake was made during creation of these. The empty data.frames that are created did not contain any data type informtion (all columns defaulted to the 'logical' data.type). If this is not wanted there now exists a new helper method which also takes care of data types. Signed-off-by: Jakob Kronawitter <[email protected]>
Previously, when an author network was created and the untracked files artifact and the base artifact were included, edges have been created among the untracked files artifact and among the base artifact. This was now changed so that there are no edges created among untracked files at any time. For the base artifact it can be configured via the new base.artifact.edges parameter in the NetworkConf. Signed-off-by: Jakob Kronawitter <[email protected]>
Signed-off-by: Jakob Kronawitter <[email protected]>
Signed-off-by: Jakob Kronawitter <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for your contribution, @SCPhantom! This is a really good start for a very huge change in the network library. Nice work so far!
Except for some minor comments, I have three major issues with the current state:
- We need documentation on the new configuration options in the README file.
- For more consistency, the base artifact for the file artifact needs to be added to
BASE.ARTIFACTS
, so we can use such constants in the code (see the corresponding comments). - The construction of co-change-based author networks looks good basically, but I suppose we can improve the code heavily (I added some extensive notes in the corresponding comment).
As I raise questions in my comments, please do not hesitate to answer and ask questions yourself. Maybe, I just forgot about something that we agreed on earlier. 😉
Anyway, we should discuss some details in a face-to-face meeting on Friday; @bockthom, @ecklbarb, you are also invited. I marked all comments with issues that are subject for debate using the emoji 🌳. Additionally, we could also discuss who is carrying out which suggested/needed changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job, @SCPhantom.
There are still some minor issues to fix (mostly documentation), but looks good now.
In addition, I have added some further comments for discussing aspects which we did not yet have discussed (at least not discussed during my presence).
Sorry for spamming you with that much mails. I should have used one review instead of that much single comments...
The global constant 'UNTRACKED.FILE' is added to avoid reusage of the same string 'untracked.file' all the time. In addition minor adjustments are made to the documentation. Signed-off-by: Jakob Kronawitter <[email protected]>
In recent scenarios and in perspective of up-coming changes, the default behavior of 'Conf' objects upon initialization and update: 1) The default values are *not* automatically checked against the allowed values anymore. This is mainly disabled to avoid confusion of users. The constructor of the class 'Conf' is adapted accordingly. 2) When updating a configuration value, the program execution is now stopped (using 'stop') on failure. Previously, the respective update has been ignored while issuing a warning. This change helps preventing confusion and analysis errors early in an analysis script. Accordingly, the parameter 'stop.on.error' to all update methods is removed. Furthermore, the code is streamlined, such that the super-constructor is called from both subclasses 'NetworkConf' and 'ProjectConf'. Some log statements are added/adjusted, too. Signed-off-by: Claus Hunsen <[email protected]>
When a network contains no edges but more than one node, all the nodes get combined. To fix this, the respecting data frame, which contains the nodes, has to be transposed. This fixes #150. Reported-by: Jakob Kronawitter <[email protected]> Signed-off-by: Thomas Bock <[email protected]>
The edge creation process which does not draw any edges among authors of untracked files and - if configured in the 'ProjectConf' - does also not draw any edges among the base artifact authors is being reworked since the old way of achieving this was rather uninituitive and complicated. Signed-off-by: Jakob Kronawitter <[email protected]>
For commits to untracked files the artifact column has previously been the copied file column (for example when looking at the commit data returned by 'get.commits'). However this should only be the case when considering file level analysis (e.g. 'artifact == file' in the 'ProjectConf'). This commit changes this to the correct behaviour. So for 'artifact == function' and 'artifact == feature' the artifact column now only contains the empty string for untracked files. To avoid hardcoding this empty string in every affected place a global constant called 'UNTRACKED.FILE.EMPTY.ARTIFACT' was added. Signed-off-by: Jakob Kronawitter <[email protected]>
As @clhunsen asked for my opinion on some ideas, I will add some comments now:
I definitely agree with this. Sounds like a good idea.
I am not sure if I understand this problem correctly. Does this only affect the test suite? However, we can discuss that in #153.
I am a little bit confused now. As I remember our discussions taking place on the last Fridays and Mondays correctly, then we have agreed on using |
Signed-off-by: Jakob Kronawitter <[email protected]>
Signed-off-by: Jakob Kronawitter <[email protected]>
The following four functions are introduced to make empty dataframe creation even easier: - create.empty.authors.list - create.empty.commits.list - create.empty.issues.list - create.empty.mails.list As the name implies, each function is creating and returning an dataframe with zero rows but the correct column names and datatypes (like a dataframe of the corresponding type would have if it was filled with authors, commits, issues or mails, respectively). In addition, all the column names and column datatypes are made publicly available by defining them as constants. Signed-off-by: Jakob Kronawitter <[email protected]>
Yeah, that is just an honest mistake. Surely, I meant
Honestly, I could not remember completely and wanted to point this out explicitly before forgetting about it eventually. To make it clear: I am fine with the current way. As we use the filtered commit data for network construction right now, we are compliant already. |
Change the behaviour of method 'set.commits' and 'get.authors.by.data.source' whenever a 'NULL' is passed as parameter. Signed-off-by: Jakob Kronawitter <[email protected]>
First of all sorry for rush-committing while discussions are / were still ongoing ;) To the points of you, @clhunsen:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for your work, @SCPhantom!
I quickly looked over the current version of this PR. Looks quite good already.
For now, I have only a few little comments. (I need to have a closer look at the new network-construction algorithm regarding the respective nodes next year.)
With that said, this is my last action on this PR before the Christmas break. Have a Merry Christmas and a Happy New Year!
Signed-off-by: Jakob Kronawitter <[email protected]>
Signed-off-by: Jakob Kronawitter <[email protected]>
Previously, next to the declaration of column names for commonly used dataframes (dataframes containing commits, mails, etc.), there was an SQL statement describing how the data was retrieved before it is read by the network library (in the tool 'codeface-extraction'). These SQL statements are now removed and instead, it is referred to the tool 'codeface-extraction' itself on GitHub. Signed-off-by: Jakob Kronawitter <[email protected]>
Once again, I think that most of the work should be done for now. I have updated the changelog once again including the relevant changes of all commits. I have ignored the two commits: 7e27a18 and 137d833, as they are not 100% clear to me at the moment. I would be glad, as they were authored by @clhunsen, if you could give me the corresponding lines for the changelog. You are the one with the better understanding here 😏 |
Whoops, that was an accident! |
I will have a closer look tomorrow. Thanks for everything! 👍
Here you are:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR is very, very close to being merged. 😀👍 Here the last changes to be made:
- two small changes to the README file (with suggestions) and
- some missing updates to copyright headers.
Signed-off-by: Jakob Kronawitter <[email protected]>
Signed-off-by: Jakob Kronawitter <[email protected]>
Thank you very much again for these extensive changes, @SCPhantom! And for your endurance. 😉 I will merge now. 👍 FYI: The test for R 3.3 is failing due to issue #152 (comment). I will add a patch for this in my next PR. |
This patch consists of three related fix and adaptations: First, the method 'ProjectData$get.authors.by.data.source' does not correct the column names of the returned data.frame anymore. This establishes compatibility with the method 'ProjectData$get.authors'. Additionally, the returned data.frame only contains unique entries. The documentation is tidied. Second, the method 'NetworkBuilder$get.author.network.cochange' is fixed by adding the missing 'private$' prefix when accessing the project data. Third, the assignment of author vertices is corrected to use only author names with the correct vertex attribute (i.e., "name"). This adapts the code with respect to the first change mentioned above. This change fixes all failing tests in PR se-sic#149. Signed-off-by: Claus Hunsen <[email protected]>
To properly test the changes introduced in PR se-sic#149 (configuration parameter 'commits.filter.untracked.files'), commit information on untracked files is added to the test suite (in detail, two commits by two distinct authors). The test suite is adapted accordingly. This is a first step for issue se-sic#153. Signed-off-by: Claus Hunsen <[email protected]>
Augmenting the changes from PR se-sic#149, the empty artifact, which results from untracked files, is not a part of any network type anymore. This is change is based on commit d9f527c and fixes se-sic#153. This is a preliminary implementation with regard to issue se-sic#154. Signed-off-by: Claus Hunsen <[email protected]>
Prerequisites
dev
.Description
See changelog.
Changelog
artifact.filter.base
, which configured wether the base artifact should beincluded in the
get.commits.filtered
method, there is now the similiar parameterfilter.untracked.files
which doesthe same thing for untracked files (11428d9)
it can be configured if the edges should be created or not using the new NetworkConf parameter
base.artifact.edges
(c60c2f6)
empty artifact
tountracked files
https://github.com/SCPhantom/codeface-extraction-r/pull/new/jakob-updates