Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change commit filtering and network building regarding the untracked files and base artifact #149

Merged
merged 31 commits into from
Jan 15, 2019
Merged
Show file tree
Hide file tree
Changes from 26 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
64a9486
Remove get.commits.raw function from util-data.R
Nov 29, 2018
894c9a5
Move artifact kind filtering functionality into the get.commits method
Dec 2, 2018
e74e15d
Adjust read.commits to return a valid data.frame instead of an empty one
Dec 4, 2018
11428d9
Restructure get.commits and get.commits.filtered(.empty) methods
Dec 6, 2018
c26e582
Delete set.commits.raw and read.commits.raw methods.
Dec 6, 2018
51617bb
Adjust two testcases to work with the new get.commits method
Dec 6, 2018
67a4fbe
Adapt test cases to new changes and improve empty dataframe creation
Dec 7, 2018
c60c2f6
Change edge generation behaviour for base and untracked files artifact
Dec 8, 2018
fada26d
Adjust copyright headers of modified files
Dec 10, 2018
43f185d
Update changelog
Dec 10, 2018
5ea65b9
Add global constant 'UNTRACKED.FILE' and adjust documentation
Dec 15, 2018
ec8c6dd
Update default behavior of 'Conf' objects
clhunsen Dec 14, 2018
0d7c222
Fix nodes for networks without edges
bockthom Dec 16, 2018
6580427
Improve edge creation concerning untracked files and the base artifact
Dec 16, 2018
dde0dd7
Leave artifact column empty if artifact == file or artifact == funtion
Dec 17, 2018
d11d0fb
Add 'UNTRACKED.FILE constant' back into the constant 'BASE.ARTIFACTS'
Dec 17, 2018
32a7162
Alter inline comments with wrong information
Dec 17, 2018
466d8eb
Change names of network and project configuration options
Dec 18, 2018
7e27a18
Further improve construction of edgeless networks
clhunsen Dec 17, 2018
dc8873e
Update changelog.
Dec 18, 2018
137d833
Fix setting authors in co-change-based author networks
clhunsen Dec 18, 2018
e709786
Update README
Dec 19, 2018
a5802b0
Update documentation and showcase.R
Dec 20, 2018
67dcf31
Rename variable 'list' to 'author.groups' and adjust documentation
Dec 20, 2018
5f0f529
Add additional utility functions for easier empty dataframe creation
Dec 20, 2018
6043e5c
Change null checking behaviour of two methods
Dec 20, 2018
418d1dc
Update README
Jan 7, 2019
523daef
Move empty dataframe creation utility functions into util-read.R
Jan 7, 2019
f8281c7
Adjust comments for the column names of commonly used dataframes
Jan 9, 2019
01217a8
Update changelog
Jan 9, 2019
ae58902
Adjust copyright headers
Jan 14, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 32 additions & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,43 @@

## Unversioned

### Added
- In addition to the ProjectConf parameter `commits.filter.base.artifact` (previously called `artifact.filter.base`),
clhunsen marked this conversation as resolved.
Show resolved Hide resolved
which configured whether the base artifact should be included in the `get.commits.filtered` method, there is now a
similiar parameter called `commits.filter.untracked.files` which does the same thing for untracked files
(11428d9847fd44f982cd094a3248bd13fb6b7b58)
- Edges are not being constructed in the author network between authors that only modify untracked files. For authors
it can be configured if the edges should be created or not using the new NetworkConf parameter
`edges.for.base.artifacts` (c60c2f6e44b6f34cccb2714eccc7674158c83dde)
- The public `get.commits.filtered.uncached` method is added which allows for external filtering of the commits by
specifying if untracked files and/or the base artifact should be filtered (this method does not take advantage of
caching, whereas the `get.commits.filtered` method does) (11428d9847fd44f982cd094a3248bd13fb6b7b58)
- The helper function `create.empty.data.frame` is introduced which returns empty dataframes (0 rows) with correct
columnns and, if specified, all the correct datatypes. In the future, functions, that return data in dataframes, should
always return dataframes of the same shape (regarding columns and datatypes) - especially when they are empty - because
this makes later case distinctions easier or unncessary (67a4fbe4f244b4b6047c2c2be7682d7f9085e9eb)

### Changed/Improved
- Change shape of `Vertices` in the legend of plots to avoid confusion (f4fb4807cfd87d9d552a9ede92ea65ae4a386a04)
- The ProjectConf's configuration parameter `artifact.filter.base` is renamed to `commits.filter.base.artifact`
(466d8eb8e7f39e43985d825636af85ddfe54b13a)
- Commits that do not change any artifact are considered to be carried out on a metafile called `<untracked.file>`
(11428d9847fd44f982cd094a3248bd13fb6b7b58)
- Remove `get.commits.raw`, `set.commits.raw` and `read.commits.raw` functions (64a94863c9e70ac8c75e443bc15cd7facbf2111d,
c26e582e4ad6bf1eaeb08202fc3e00394332a013)
- Remove `get.commits.filtered.empty` and corresponding `filter.commits.empty` method, the functionality is moved to the
altered `get.commits.filtered` and `filter.commits` method respectively (11428d9847fd44f982cd094a3248bd13fb6b7b58)
- The `filter.commits` method now takes parameters which configure if untracked files and/or the base artifact should be
filtered out (11428d9847fd44f982cd094a3248bd13fb6b7b58)
- Filtering by artifact kind (e.g. filtering out either Feature or FeatureExpression) is now being done in the
`get.commits` method instead of the `get.commits.filtered` method (894c9a5c181fef14dcb71fa23699bebbcbcd2b4f)
- The `NetworkConf` and the `ProjectConf` now print out an error message and stop whenever it is attempted to set
non-existing configuration parameters (ec8c6dd72746a0506b3e03dccc4fcaf7a03325ea)

### Fixed
- Fix error when resetting an `ProjectData` environment (c64cab84e928a2a4c89a6df12440ba7ca06e6263)

- Fix bug which lead to wrong network construction in the case of networks with more than one node but no edges
(#150, 0d7c2226da67f3537f3ff9d013607fe19df8a4c0)

## 3.4

Expand Down
11 changes: 8 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -482,9 +482,11 @@ There is no way to update the entries, except for the revision-based parameters.

**Note**: These parameters can be configured using the method `ProjectConf$update.values()`.

- `artifact.filter.base`
* Remove all artifact information regarding the base artifact
(`"Base_Feature"` or `"File_Level"` for features and functions, respectively, as artifacts)
- `commits.filter.base.artifact`
* Remove all information concerning the base artifact from the commit data. This effect becomes clear when retrieving commits using `get.commits.filtered`, because then the result of which does not contain any commit information about changes to the base artifact. Networks built on top of this `ProjectData` do also not contain any base artifact information anymore.
* [*`TRUE`*, `FALSE`]
- `commits.filter.untracked.files`
* Remove all information concerning untracked files from the commit data. This effect becomes clear when retrieving commits using `get.commits.filtered`, because then the result of which does not contain any commits that solely changed untracked files. Networks built on top of this `ProjectData` do also not contain any information about untracked files.
* [*`TRUE`*, `FALSE`]
- `issues.only.comments`
* Only use comments from the issue data on disk and no further events such as references and label changes
Expand Down Expand Up @@ -552,6 +554,9 @@ Updates to the parameters can be done by calling `NetworkConf$update.variables(.
* **Note**: `"date"` and `"artifact.type"` are always included as this information is needed for several parts of the library, e.g., time-based splitting.
* **Note**: For each type of network that can be built, only the applicable part of the given vector of names is respected.
* **Note**: For the edge attributes `"pasta"` and `"synchronicity"`, the project configuration's parameters `pasta` and `synchronicity` need to be set to `TRUE`, respectively (see below).
- `edges.for.base.artifacts`
* Controls whether edges should be drawn between authors for being involved in committing to the base artifact
jkronaw marked this conversation as resolved.
Show resolved Hide resolved
* [*`TRUE`*, `FALSE`]
- `simplify`
* Perform edge contraction to retrieve a simplified network
* [`TRUE`, *`FALSE`*]
Expand Down
7 changes: 4 additions & 3 deletions showcase.R
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
## Copyright 2017 by Christian Hechtl <[email protected]>
## Copyright 2017 by Felix Prasse <[email protected]>
## Copyright 2017-2018 by Thomas Bock <[email protected]>
## Copyright 2018 by Jakob Kronawitter <[email protected]>
## All Rights Reserved.


Expand Down Expand Up @@ -60,7 +61,7 @@ ARTIFACT.RELATION = "cochange" # cochange, callgraph, mail, issue

## initialize project configuration
proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, ARTIFACT)
proj.conf$update.value("artifact.filter.base", TRUE)
proj.conf$update.value("commits.filter.base.artifact", TRUE)
# proj.conf$print()

## initialize network configuration
Expand All @@ -85,7 +86,7 @@ x = NetworkBuilder$new(project.data = x.data, network.conf = net.conf)
# x.data$get.synchronicity()
# x.data$group.artifacts.by.data.column("commits", "author.name")
# x.data$get.commits.filtered()
# x.data$get.commits.filtered.empty()
jkronaw marked this conversation as resolved.
Show resolved Hide resolved
# x.data$get.commits.filtered.uncached(remove.untracked.files = TRUE, remove.base.artifact = FALSE)
# x.data$get.mails()
# x.data$get.authors()
# x.data$get.data.path()
Expand Down Expand Up @@ -126,7 +127,7 @@ y = NetworkBuilder$new(project.data = y.data, network.conf = net.conf)
# y.data$get.synchronicity()
# y.data$group.artifacts.by.data.column("commits", "author.name")
# y.data$get.commits.filtered()
# y.data$get.commits.filtered.empty()
jkronaw marked this conversation as resolved.
Show resolved Hide resolved
# y.data$get.commits.filtered.uncached(remove.untracked.files = TRUE, remove.base.artifact = FALSE)
# y.data$get.mails()
# y.data$get.authors()
# y.data$get.data.path()
Expand Down
37 changes: 17 additions & 20 deletions tests/test-data-cut.R
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
## Copyright 2018 by Claus Hunsen <[email protected]>
## Copyright 2018 by Barbara Eckl <[email protected]>
## Copyright 2018 by Thomas Bock <[email protected]>
## Copyright 2018 by Jakob Kronawitter <[email protected]>
## All Rights Reserved.


Expand Down Expand Up @@ -44,26 +45,22 @@ test_that("Cut commit and mail data to same date range.", {

x.data = ProjectData$new(proj.conf)

commit.data.expected = data.frame(commit.id = sprintf("<commit-%s>", c(32712, 32712, 32713, 32713)),
date = get.date.from.string(c("2016-07-12 15:58:59", "2016-07-12 15:58:59", "2016-07-12 16:00:45",
"2016-07-12 16:00:45")),
author.name = c("Björn", "Björn", "Olaf", "Olaf"),
author.email = c("[email protected]", "[email protected]", "[email protected]",
"[email protected]"),
committer.date = get.date.from.string(c("2016-07-12 15:58:59", "2016-07-12 15:58:59", "2016-07-20 10:00:44",
"2016-07-20 10:00:44")),
committer.name = c("Björn", "Björn", "Björn", "Björn"),
committer.email = c("[email protected]", "[email protected]", "[email protected]", "[email protected]"),
hash = c("72c8dd25d3dd6d18f46e2b26a5f5b1e2e8dc28d0", "72c8dd25d3dd6d18f46e2b26a5f5b1e2e8dc28d0",
"5a5ec9675e98187e1e92561e1888aa6f04faa338", "5a5ec9675e98187e1e92561e1888aa6f04faa338"),
changed.files = as.integer(c(1, 1, 1, 1)),
added.lines = as.integer(c(1, 1, 1, 1)),
deleted.lines = as.integer(c(1, 1, 0, 0)),
diff.size = as.integer(c(2, 2, 1, 1)),
file = c("test.c", "test.c", "test.c", "test.c"),
artifact = c("A", "defined(A)", "A", "defined(A)"),
artifact.type = c("Feature", "FeatureExpression", "Feature", "FeatureExpression"),
artifact.diff.size = as.integer(c(1, 1, 1, 1)))
commit.data.expected = data.frame(commit.id = sprintf("<commit-%s>", c(32712, 32713)),
date = get.date.from.string(c("2016-07-12 15:58:59", "2016-07-12 16:00:45")),
author.name = c("Björn", "Olaf"),
author.email = c("[email protected]", "[email protected]"),
committer.date = get.date.from.string(c("2016-07-12 15:58:59", "2016-07-20 10:00:44")),
committer.name = c("Björn", "Björn"),
committer.email = c("[email protected]", "[email protected]"),
hash = c("72c8dd25d3dd6d18f46e2b26a5f5b1e2e8dc28d0", "5a5ec9675e98187e1e92561e1888aa6f04faa338"),
changed.files = as.integer(c(1, 1)),
added.lines = as.integer(c(1, 1)),
deleted.lines = as.integer(c(1, 0)),
diff.size = as.integer(c(2, 1)),
file = c("test.c", "test.c"),
artifact = c("A", "A"),
artifact.type = c("Feature", "Feature"),
artifact.diff.size = as.integer(c(1, 1)))

mail.data.expected = data.frame(author.name = c("Thomas"),
author.email = c("[email protected]"),
Expand Down
2 changes: 1 addition & 1 deletion tests/test-networks-artifact.R
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ test_that("Network construction of the undirected artifact-cochange network", {

## configurations
proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, ARTIFACT)
proj.conf$update.value("artifact.filter.base", FALSE)
proj.conf$update.value("commits.filter.base.artifact", FALSE)
jkronaw marked this conversation as resolved.
Show resolved Hide resolved
net.conf = NetworkConf$new()
net.conf$update.values(updated.values = list(artifact.relation = "cochange"))

Expand Down
16 changes: 8 additions & 8 deletions tests/test-networks-author.R
Original file line number Diff line number Diff line change
Expand Up @@ -139,7 +139,7 @@ test_that("Amount of authors (author.all.authors, author.only.committers).", {

## configurations
proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, ARTIFACT)
proj.conf$update.value("artifact.filter.base", FALSE)
proj.conf$update.value("commits.filter.base.artifact", FALSE)
jkronaw marked this conversation as resolved.
Show resolved Hide resolved
net.conf = NetworkConf$new()

## update network configuration
Expand Down Expand Up @@ -198,7 +198,7 @@ test_that("Network construction of the undirected author-cochange network", {

## configurations
proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, ARTIFACT)
proj.conf$update.value("artifact.filter.base", FALSE)
proj.conf$update.value("commits.filter.base.artifact", FALSE)
net.conf = NetworkConf$new()
net.conf$update.values(updated.values = list(author.relation = "cochange"))

Expand Down Expand Up @@ -243,7 +243,7 @@ test_that("Network construction of the undirected but temorally ordered author-c

## configurations
proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, ARTIFACT)
proj.conf$update.value("artifact.filter.base", FALSE)
proj.conf$update.value("commits.filter.base.artifact", FALSE)
net.conf = NetworkConf$new()
net.conf$update.values(updated.values = list(author.relation = "cochange", author.directed = FALSE,
author.respect.temporal.order = TRUE))
Expand Down Expand Up @@ -285,7 +285,7 @@ test_that("Network construction of the directed author-cochange network", {

## configurations
proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, ARTIFACT)
proj.conf$update.value("artifact.filter.base", FALSE)
proj.conf$update.value("commits.filter.base.artifact", FALSE)
net.conf = NetworkConf$new()
net.conf$update.values(updated.values = list(author.relation = "cochange", author.directed = TRUE))

Expand Down Expand Up @@ -326,7 +326,7 @@ test_that("Network construction of the directed author-cochange network without

## configurations
proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, ARTIFACT)
proj.conf$update.value("artifact.filter.base", FALSE)
proj.conf$update.value("commits.filter.base.artifact", FALSE)
net.conf = NetworkConf$new()
net.conf$update.values(updated.values = list(author.relation = "cochange", author.directed = TRUE,
author.respect.temporal.order = FALSE))
Expand Down Expand Up @@ -372,7 +372,7 @@ test_that("Network construction of the undirected simplified author-cochange net

## configurations
proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, ARTIFACT)
proj.conf$update.value("artifact.filter.base", FALSE)
proj.conf$update.value("commits.filter.base.artifact", FALSE)
net.conf = NetworkConf$new()
net.conf$update.values(updated.values = list(author.relation = "cochange", simplify = TRUE))

Expand Down Expand Up @@ -420,7 +420,7 @@ test_that("Network construction of the undirected author-issue network with all

## configurations
proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, ARTIFACT)
proj.conf$update.value("artifact.filter.base", FALSE)
proj.conf$update.value("commits.filter.base.artifact", FALSE)
proj.conf$update.value("issues.only.comments", FALSE)
net.conf = NetworkConf$new()
net.conf$update.values(updated.values = list(author.relation = "issue"))
Expand Down Expand Up @@ -511,7 +511,7 @@ test_that("Network construction of the undirected author-issue network with just

## configurations
proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, ARTIFACT)
proj.conf$update.value("artifact.filter.base", FALSE)
proj.conf$update.value("commits.filter.base.artifact", FALSE)
net.conf = NetworkConf$new()
net.conf$update.values(updated.values = list(author.relation = "issue"))

Expand Down
Loading