Skip to content

Commit

Permalink
Merge pull request #149 from SCPhantom/jakob-updates
Browse files Browse the repository at this point in the history
Change commit filtering and network building regarding the untracked files and base artifact

Reviewed-by: Claus Hunsen <[email protected]>
Reviewed-by: Thomas Bock <[email protected]>
  • Loading branch information
clhunsen authored Jan 15, 2019
2 parents 1da649f + ae58902 commit 2488039
Show file tree
Hide file tree
Showing 20 changed files with 587 additions and 344 deletions.
47 changes: 46 additions & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,57 @@

## Unversioned

### Added
- In addition to the ProjectConf parameter `commits.filter.base.artifact` (previously called `artifact.filter.base`),
which configured whether the base artifact should be included in the `get.commits.filtered` method, there is now a
similiar parameter called `commits.filter.untracked.files` doing the same thing for untracked files
(11428d9847fd44f982cd094a3248bd13fb6b7b58, 466d8eb8e7f39e43985d825636af85ddfe54b13a)
- The public `get.commits.filtered.uncached` method is added which allows for external filtering of the commits by
specifying if untracked files and/or the base artifact should be filtered (this method does not take advantage of
caching, whereas the `get.commits.filtered` method does) (11428d9847fd44f982cd094a3248bd13fb6b7b58)
- Commits that do not change any artifact are considered to be carried out on a metafile called `<untracked.file>`.
The constant `UNTRACKED.FILE` was added to the file `util-data.R` and holds the string constant `<untracked.file>`.
(11428d9847fd44f982cd094a3248bd13fb6b7b58, 5ea65b9ac5a22967de87d7fd4ac66b0bc8e07238)
- In an author network, edges do not get constructed anymore between authors for solely modifying untracked files. For
authors involved in changing the base artifact, it can be configured whether edges should be created or not using the
new NetworkConf parameter `edges.for.base.artifacts`
(c60c2f6e44b6f34cccb2714eccc7674158c83dde, 466d8eb8e7f39e43985d825636af85ddfe54b13a)
- A new constant named `UNTRACKED.FILE.EMPTY.ARTIFACT` has been introduced in the `util-data.R` which simply holds an
empty string. If used in the intended context, this constant (and thus this empty string) denominates the empty artifact,
which is now called `<untracked.file>` (see the constant `UNTRACKED.FILE`). The empty string was chosen, as this is the
way that untracked files were named in the file `commits.list` coming from the tool `codeface-extraction`
(dde0dd7c6b36b49aa2b6c91395be8ea6e0cd7969)
- The helper function `create.empty.data.frame` is introduced which returns empty dataframes (0 rows) with correct
columnns and, if specified, all the correct datatypes. In the future, functions, that return data in dataframes, should
always return dataframes of the same shape (regarding columns and datatypes) - especially when they are empty - because
this makes later case distinctions easier or unncessary (67a4fbe4f244b4b6047c2c2be7682d7f9085e9eb)
- For the most common types of dataframes (dataframes of commits, mails, issues and authors) four more utility methods
were added, namely `create.empty.authors.list`, `create.empty.commits.list`, `create.empty.issues.list`,
`create.empty.mails.list` as well as corresponding constants holding columns and associated datatypes for all these
empty dataframes (5f0f52936b4433f64fd9b1c9b2571eb26f66395f, 523daef8cf4642a2360396b11f0d74bce565b0f0)
- Add method `ProjectData$get.authors.by.data.source` to retrieve authors by given data-source name (#149, 65804276dd2ada9b2f00b2cab7b6ad0cecbe733e, 137d8337bc35f5a83aa16a48ef8e47fc0d36b36c)

### Changed/Improved
- Rename `ProjectConf` parameter `artifact.filter.base` to `commits.filter.base.artifact` (PR #149, 466d8eb8e7f39e43985d825636af85ddfe54b13a)
- Change shape of `Vertices` in the legend of plots to avoid confusion (f4fb4807cfd87d9d552a9ede92ea65ae4a386a04)
- Remove `get.commits.raw`, `set.commits.raw` and `read.commits.raw` functions (64a94863c9e70ac8c75e443bc15cd7facbf2111d,
c26e582e4ad6bf1eaeb08202fc3e00394332a013)
- Filtering by artifact kind (e.g. filtering out either Feature or FeatureExpression) is now being done in the
`get.commits` method instead of the `get.commits.filtered` method (894c9a5c181fef14dcb71fa23699bebbcbcd2b4f)
- Remove `get.commits.filtered.empty` and corresponding `filter.commits.empty` method, the functionality is now included
into the methods `get.commits.filtered` and `filter.commits` respectively (11428d9847fd44f982cd094a3248bd13fb6b7b58)
- The constant `BASE.ARTIFACTS` in the file `util-data.R` was extended by adding untracked files (i.e. the new metafile
`UNTRACKED.FILE`), which is now considered to be a new base artifact in the case of file level analyses. This implies,
that in case of file level anlyses the base artifact and the untracked files fall together, while in feature and
function level analyzes they are treated differently (d11d0fb585397fdb3a2641484248f74752db9331)
- The `filter.commits` method now takes parameters which configure if untracked files and/or the base artifact should be
filtered out (11428d9847fd44f982cd094a3248bd13fb6b7b58)
- In the class `Conf` (and its sub-classes `NetworkConf` and `ProjectConf`), default parameters are not validated anymore to avoid confusion by logging output (ec8c6dd72746a0506b3e03dccc4fcaf7a03325ea)
- In the class `Conf` (and its sub-classes `NetworkConf` and `ProjectConf`), `stop` is called on errors during parameter updates now (ec8c6dd72746a0506b3e03dccc4fcaf7a03325ea)

### Fixed
- Fix error when resetting an `ProjectData` environment (c64cab84e928a2a4c89a6df12440ba7ca06e6263)

- Fix vertices for networks without edges (#150, PR #149, 0d7c2226da67f3537f3ff9d013607fe19df8a4c0, 7e27a182de282f054f08e3a2fb04d852c2c55102)

## 3.4

Expand Down
11 changes: 8 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -482,9 +482,11 @@ There is no way to update the entries, except for the revision-based parameters.

**Note**: These parameters can be configured using the method `ProjectConf$update.values()`.

- `artifact.filter.base`
* Remove all artifact information regarding the base artifact
(`"Base_Feature"` or `"File_Level"` for features and functions, respectively, as artifacts)
- `commits.filter.base.artifact`
* Remove all information concerning the base artifact from the commit data. This effect becomes clear when retrieving commits using `get.commits.filtered`, because then the result of which does not contain any commit information about changes to the base artifact. Networks built on top of this `ProjectData` do also not contain any base artifact information anymore.
* [*`TRUE`*, `FALSE`]
- `commits.filter.untracked.files`
* Remove all information concerning untracked files from the commit data. This effect becomes clear when retrieving commits using `get.commits.filtered`, because then the result of which does not contain any commits that solely changed untracked files. Networks built on top of this `ProjectData` do also not contain any information about untracked files.
* [*`TRUE`*, `FALSE`]
- `issues.only.comments`
* Only use comments from the issue data on disk and no further events such as references and label changes
Expand Down Expand Up @@ -552,6 +554,9 @@ Updates to the parameters can be done by calling `NetworkConf$update.variables(.
* **Note**: `"date"` and `"artifact.type"` are always included as this information is needed for several parts of the library, e.g., time-based splitting.
* **Note**: For each type of network that can be built, only the applicable part of the given vector of names is respected.
* **Note**: For the edge attributes `"pasta"` and `"synchronicity"`, the project configuration's parameters `pasta` and `synchronicity` need to be set to `TRUE`, respectively (see below).
- `edges.for.base.artifacts`
* Controls whether edges should be drawn between authors for being involved in authoring commits to the base artifact. This parameter does not have any effect if the base artifact was filtered beforehand (e.g., when `commits.filter.base.artifact == TRUE`, or, when `commits.filter.untracked.files == TRUE` and `artifact == FILE`; all of these options can be configured in the `ProjectConf`; warning: `commits.filter.base.artifact` and `commits.filter.untracked.files` are `TRUE` by default).
* [*`TRUE`*, `FALSE`]
- `simplify`
* Perform edge contraction to retrieve a simplified network
* [`TRUE`, *`FALSE`*]
Expand Down
7 changes: 4 additions & 3 deletions showcase.R
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
## Copyright 2017 by Christian Hechtl <[email protected]>
## Copyright 2017 by Felix Prasse <[email protected]>
## Copyright 2017-2018 by Thomas Bock <[email protected]>
## Copyright 2018 by Jakob Kronawitter <[email protected]>
## All Rights Reserved.


Expand Down Expand Up @@ -60,7 +61,7 @@ ARTIFACT.RELATION = "cochange" # cochange, callgraph, mail, issue

## initialize project configuration
proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, ARTIFACT)
proj.conf$update.value("artifact.filter.base", TRUE)
proj.conf$update.value("commits.filter.base.artifact", TRUE)
# proj.conf$print()

## initialize network configuration
Expand All @@ -85,7 +86,7 @@ x = NetworkBuilder$new(project.data = x.data, network.conf = net.conf)
# x.data$get.synchronicity()
# x.data$group.artifacts.by.data.column("commits", "author.name")
# x.data$get.commits.filtered()
# x.data$get.commits.filtered.empty()
# x.data$get.commits.filtered.uncached(remove.untracked.files = TRUE, remove.base.artifact = FALSE)
# x.data$get.mails()
# x.data$get.authors()
# x.data$get.data.path()
Expand Down Expand Up @@ -126,7 +127,7 @@ y = NetworkBuilder$new(project.data = y.data, network.conf = net.conf)
# y.data$get.synchronicity()
# y.data$group.artifacts.by.data.column("commits", "author.name")
# y.data$get.commits.filtered()
# y.data$get.commits.filtered.empty()
# y.data$get.commits.filtered.uncached(remove.untracked.files = TRUE, remove.base.artifact = FALSE)
# y.data$get.mails()
# y.data$get.authors()
# y.data$get.data.path()
Expand Down
37 changes: 17 additions & 20 deletions tests/test-data-cut.R
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
## Copyright 2018 by Claus Hunsen <[email protected]>
## Copyright 2018 by Barbara Eckl <[email protected]>
## Copyright 2018 by Thomas Bock <[email protected]>
## Copyright 2018 by Jakob Kronawitter <[email protected]>
## All Rights Reserved.


Expand Down Expand Up @@ -44,26 +45,22 @@ test_that("Cut commit and mail data to same date range.", {

x.data = ProjectData$new(proj.conf)

commit.data.expected = data.frame(commit.id = sprintf("<commit-%s>", c(32712, 32712, 32713, 32713)),
date = get.date.from.string(c("2016-07-12 15:58:59", "2016-07-12 15:58:59", "2016-07-12 16:00:45",
"2016-07-12 16:00:45")),
author.name = c("Björn", "Björn", "Olaf", "Olaf"),
author.email = c("[email protected]", "[email protected]", "[email protected]",
"[email protected]"),
committer.date = get.date.from.string(c("2016-07-12 15:58:59", "2016-07-12 15:58:59", "2016-07-20 10:00:44",
"2016-07-20 10:00:44")),
committer.name = c("Björn", "Björn", "Björn", "Björn"),
committer.email = c("[email protected]", "[email protected]", "[email protected]", "[email protected]"),
hash = c("72c8dd25d3dd6d18f46e2b26a5f5b1e2e8dc28d0", "72c8dd25d3dd6d18f46e2b26a5f5b1e2e8dc28d0",
"5a5ec9675e98187e1e92561e1888aa6f04faa338", "5a5ec9675e98187e1e92561e1888aa6f04faa338"),
changed.files = as.integer(c(1, 1, 1, 1)),
added.lines = as.integer(c(1, 1, 1, 1)),
deleted.lines = as.integer(c(1, 1, 0, 0)),
diff.size = as.integer(c(2, 2, 1, 1)),
file = c("test.c", "test.c", "test.c", "test.c"),
artifact = c("A", "defined(A)", "A", "defined(A)"),
artifact.type = c("Feature", "FeatureExpression", "Feature", "FeatureExpression"),
artifact.diff.size = as.integer(c(1, 1, 1, 1)))
commit.data.expected = data.frame(commit.id = sprintf("<commit-%s>", c(32712, 32713)),
date = get.date.from.string(c("2016-07-12 15:58:59", "2016-07-12 16:00:45")),
author.name = c("Björn", "Olaf"),
author.email = c("[email protected]", "[email protected]"),
committer.date = get.date.from.string(c("2016-07-12 15:58:59", "2016-07-20 10:00:44")),
committer.name = c("Björn", "Björn"),
committer.email = c("[email protected]", "[email protected]"),
hash = c("72c8dd25d3dd6d18f46e2b26a5f5b1e2e8dc28d0", "5a5ec9675e98187e1e92561e1888aa6f04faa338"),
changed.files = as.integer(c(1, 1)),
added.lines = as.integer(c(1, 1)),
deleted.lines = as.integer(c(1, 0)),
diff.size = as.integer(c(2, 1)),
file = c("test.c", "test.c"),
artifact = c("A", "A"),
artifact.type = c("Feature", "Feature"),
artifact.diff.size = as.integer(c(1, 1)))

mail.data.expected = data.frame(author.name = c("Thomas"),
author.email = c("[email protected]"),
Expand Down
3 changes: 2 additions & 1 deletion tests/test-networks-artifact.R
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
## Copyright 2017-2018 by Christian Hechtl <[email protected]>
## Copyright 2017 by Claus Hunsen <[email protected]>
## Copyright 2018 by Barbara Eckl <[email protected]>
## Copyright 2018 by Jakob Kronawitter <[email protected]>
## All Rights Reserved.


Expand All @@ -36,7 +37,7 @@ test_that("Network construction of the undirected artifact-cochange network", {

## configurations
proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, ARTIFACT)
proj.conf$update.value("artifact.filter.base", FALSE)
proj.conf$update.value("commits.filter.base.artifact", FALSE)
net.conf = NetworkConf$new()
net.conf$update.values(updated.values = list(artifact.relation = "cochange"))

Expand Down
17 changes: 9 additions & 8 deletions tests/test-networks-author.R
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
## Copyright 2017 by Felix Prasse <[email protected]>
## Copyright 2018 by Barbara Eckl <[email protected]>
## Copyright 2018 by Thomas Bock <[email protected]>
## Copyright 2018 by Jakob Kronawitter <[email protected]>
## All Rights Reserved.


Expand Down Expand Up @@ -139,7 +140,7 @@ test_that("Amount of authors (author.all.authors, author.only.committers).", {

## configurations
proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, ARTIFACT)
proj.conf$update.value("artifact.filter.base", FALSE)
proj.conf$update.value("commits.filter.base.artifact", FALSE)
net.conf = NetworkConf$new()

## update network configuration
Expand Down Expand Up @@ -198,7 +199,7 @@ test_that("Network construction of the undirected author-cochange network", {

## configurations
proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, ARTIFACT)
proj.conf$update.value("artifact.filter.base", FALSE)
proj.conf$update.value("commits.filter.base.artifact", FALSE)
net.conf = NetworkConf$new()
net.conf$update.values(updated.values = list(author.relation = "cochange"))

Expand Down Expand Up @@ -243,7 +244,7 @@ test_that("Network construction of the undirected but temorally ordered author-c

## configurations
proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, ARTIFACT)
proj.conf$update.value("artifact.filter.base", FALSE)
proj.conf$update.value("commits.filter.base.artifact", FALSE)
net.conf = NetworkConf$new()
net.conf$update.values(updated.values = list(author.relation = "cochange", author.directed = FALSE,
author.respect.temporal.order = TRUE))
Expand Down Expand Up @@ -285,7 +286,7 @@ test_that("Network construction of the directed author-cochange network", {

## configurations
proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, ARTIFACT)
proj.conf$update.value("artifact.filter.base", FALSE)
proj.conf$update.value("commits.filter.base.artifact", FALSE)
net.conf = NetworkConf$new()
net.conf$update.values(updated.values = list(author.relation = "cochange", author.directed = TRUE))

Expand Down Expand Up @@ -326,7 +327,7 @@ test_that("Network construction of the directed author-cochange network without

## configurations
proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, ARTIFACT)
proj.conf$update.value("artifact.filter.base", FALSE)
proj.conf$update.value("commits.filter.base.artifact", FALSE)
net.conf = NetworkConf$new()
net.conf$update.values(updated.values = list(author.relation = "cochange", author.directed = TRUE,
author.respect.temporal.order = FALSE))
Expand Down Expand Up @@ -372,7 +373,7 @@ test_that("Network construction of the undirected simplified author-cochange net

## configurations
proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, ARTIFACT)
proj.conf$update.value("artifact.filter.base", FALSE)
proj.conf$update.value("commits.filter.base.artifact", FALSE)
net.conf = NetworkConf$new()
net.conf$update.values(updated.values = list(author.relation = "cochange", simplify = TRUE))

Expand Down Expand Up @@ -420,7 +421,7 @@ test_that("Network construction of the undirected author-issue network with all

## configurations
proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, ARTIFACT)
proj.conf$update.value("artifact.filter.base", FALSE)
proj.conf$update.value("commits.filter.base.artifact", FALSE)
proj.conf$update.value("issues.only.comments", FALSE)
net.conf = NetworkConf$new()
net.conf$update.values(updated.values = list(author.relation = "issue"))
Expand Down Expand Up @@ -511,7 +512,7 @@ test_that("Network construction of the undirected author-issue network with just

## configurations
proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, ARTIFACT)
proj.conf$update.value("artifact.filter.base", FALSE)
proj.conf$update.value("commits.filter.base.artifact", FALSE)
net.conf = NetworkConf$new()
net.conf$update.values(updated.values = list(author.relation = "issue"))

Expand Down
Loading

0 comments on commit 2488039

Please sign in to comment.