diff --git a/NEWS.md b/NEWS.md index 87dbfece..afbba39d 100644 --- a/NEWS.md +++ b/NEWS.md @@ -2,12 +2,57 @@ ## Unversioned +### Added +- In addition to the ProjectConf parameter `commits.filter.base.artifact` (previously called `artifact.filter.base`), +which configured whether the base artifact should be included in the `get.commits.filtered` method, there is now a +similiar parameter called `commits.filter.untracked.files` doing the same thing for untracked files +(11428d9847fd44f982cd094a3248bd13fb6b7b58, 466d8eb8e7f39e43985d825636af85ddfe54b13a) +- The public `get.commits.filtered.uncached` method is added which allows for external filtering of the commits by +specifying if untracked files and/or the base artifact should be filtered (this method does not take advantage of +caching, whereas the `get.commits.filtered` method does) (11428d9847fd44f982cd094a3248bd13fb6b7b58) +- Commits that do not change any artifact are considered to be carried out on a metafile called ``. +The constant `UNTRACKED.FILE` was added to the file `util-data.R` and holds the string constant ``. +(11428d9847fd44f982cd094a3248bd13fb6b7b58, 5ea65b9ac5a22967de87d7fd4ac66b0bc8e07238) +- In an author network, edges do not get constructed anymore between authors for solely modifying untracked files. For +authors involved in changing the base artifact, it can be configured whether edges should be created or not using the +new NetworkConf parameter `edges.for.base.artifacts` +(c60c2f6e44b6f34cccb2714eccc7674158c83dde, 466d8eb8e7f39e43985d825636af85ddfe54b13a) +- A new constant named `UNTRACKED.FILE.EMPTY.ARTIFACT` has been introduced in the `util-data.R` which simply holds an +empty string. If used in the intended context, this constant (and thus this empty string) denominates the empty artifact, +which is now called `` (see the constant `UNTRACKED.FILE`). The empty string was chosen, as this is the +way that untracked files were named in the file `commits.list` coming from the tool `codeface-extraction` +(dde0dd7c6b36b49aa2b6c91395be8ea6e0cd7969) +- The helper function `create.empty.data.frame` is introduced which returns empty dataframes (0 rows) with correct +columnns and, if specified, all the correct datatypes. In the future, functions, that return data in dataframes, should +always return dataframes of the same shape (regarding columns and datatypes) - especially when they are empty - because +this makes later case distinctions easier or unncessary (67a4fbe4f244b4b6047c2c2be7682d7f9085e9eb) +- For the most common types of dataframes (dataframes of commits, mails, issues and authors) four more utility methods +were added, namely `create.empty.authors.list`, `create.empty.commits.list`, `create.empty.issues.list`, +`create.empty.mails.list` as well as corresponding constants holding columns and associated datatypes for all these +empty dataframes (5f0f52936b4433f64fd9b1c9b2571eb26f66395f, 523daef8cf4642a2360396b11f0d74bce565b0f0) +- Add method `ProjectData$get.authors.by.data.source` to retrieve authors by given data-source name (#149, 65804276dd2ada9b2f00b2cab7b6ad0cecbe733e, 137d8337bc35f5a83aa16a48ef8e47fc0d36b36c) + ### Changed/Improved +- Rename `ProjectConf` parameter `artifact.filter.base` to `commits.filter.base.artifact` (PR #149, 466d8eb8e7f39e43985d825636af85ddfe54b13a) - Change shape of `Vertices` in the legend of plots to avoid confusion (f4fb4807cfd87d9d552a9ede92ea65ae4a386a04) +- Remove `get.commits.raw`, `set.commits.raw` and `read.commits.raw` functions (64a94863c9e70ac8c75e443bc15cd7facbf2111d, +c26e582e4ad6bf1eaeb08202fc3e00394332a013) +- Filtering by artifact kind (e.g. filtering out either Feature or FeatureExpression) is now being done in the +`get.commits` method instead of the `get.commits.filtered` method (894c9a5c181fef14dcb71fa23699bebbcbcd2b4f) +- Remove `get.commits.filtered.empty` and corresponding `filter.commits.empty` method, the functionality is now included +into the methods `get.commits.filtered` and `filter.commits` respectively (11428d9847fd44f982cd094a3248bd13fb6b7b58) +- The constant `BASE.ARTIFACTS` in the file `util-data.R` was extended by adding untracked files (i.e. the new metafile +`UNTRACKED.FILE`), which is now considered to be a new base artifact in the case of file level analyses. This implies, +that in case of file level anlyses the base artifact and the untracked files fall together, while in feature and +function level analyzes they are treated differently (d11d0fb585397fdb3a2641484248f74752db9331) +- The `filter.commits` method now takes parameters which configure if untracked files and/or the base artifact should be +filtered out (11428d9847fd44f982cd094a3248bd13fb6b7b58) +- In the class `Conf` (and its sub-classes `NetworkConf` and `ProjectConf`), default parameters are not validated anymore to avoid confusion by logging output (ec8c6dd72746a0506b3e03dccc4fcaf7a03325ea) +- In the class `Conf` (and its sub-classes `NetworkConf` and `ProjectConf`), `stop` is called on errors during parameter updates now (ec8c6dd72746a0506b3e03dccc4fcaf7a03325ea) ### Fixed - Fix error when resetting an `ProjectData` environment (c64cab84e928a2a4c89a6df12440ba7ca06e6263) - +- Fix vertices for networks without edges (#150, PR #149, 0d7c2226da67f3537f3ff9d013607fe19df8a4c0, 7e27a182de282f054f08e3a2fb04d852c2c55102) ## 3.4 diff --git a/README.md b/README.md index b90a663c..be9e7f2f 100644 --- a/README.md +++ b/README.md @@ -482,9 +482,11 @@ There is no way to update the entries, except for the revision-based parameters. **Note**: These parameters can be configured using the method `ProjectConf$update.values()`. -- `artifact.filter.base` - * Remove all artifact information regarding the base artifact - (`"Base_Feature"` or `"File_Level"` for features and functions, respectively, as artifacts) +- `commits.filter.base.artifact` + * Remove all information concerning the base artifact from the commit data. This effect becomes clear when retrieving commits using `get.commits.filtered`, because then the result of which does not contain any commit information about changes to the base artifact. Networks built on top of this `ProjectData` do also not contain any base artifact information anymore. + * [*`TRUE`*, `FALSE`] +- `commits.filter.untracked.files` + * Remove all information concerning untracked files from the commit data. This effect becomes clear when retrieving commits using `get.commits.filtered`, because then the result of which does not contain any commits that solely changed untracked files. Networks built on top of this `ProjectData` do also not contain any information about untracked files. * [*`TRUE`*, `FALSE`] - `issues.only.comments` * Only use comments from the issue data on disk and no further events such as references and label changes @@ -552,6 +554,9 @@ Updates to the parameters can be done by calling `NetworkConf$update.variables(. * **Note**: `"date"` and `"artifact.type"` are always included as this information is needed for several parts of the library, e.g., time-based splitting. * **Note**: For each type of network that can be built, only the applicable part of the given vector of names is respected. * **Note**: For the edge attributes `"pasta"` and `"synchronicity"`, the project configuration's parameters `pasta` and `synchronicity` need to be set to `TRUE`, respectively (see below). +- `edges.for.base.artifacts` + * Controls whether edges should be drawn between authors for being involved in authoring commits to the base artifact. This parameter does not have any effect if the base artifact was filtered beforehand (e.g., when `commits.filter.base.artifact == TRUE`, or, when `commits.filter.untracked.files == TRUE` and `artifact == FILE`; all of these options can be configured in the `ProjectConf`; warning: `commits.filter.base.artifact` and `commits.filter.untracked.files` are `TRUE` by default). + * [*`TRUE`*, `FALSE`] - `simplify` * Perform edge contraction to retrieve a simplified network * [`TRUE`, *`FALSE`*] diff --git a/showcase.R b/showcase.R index 2222913a..455839c8 100644 --- a/showcase.R +++ b/showcase.R @@ -16,6 +16,7 @@ ## Copyright 2017 by Christian Hechtl ## Copyright 2017 by Felix Prasse ## Copyright 2017-2018 by Thomas Bock +## Copyright 2018 by Jakob Kronawitter ## All Rights Reserved. @@ -60,7 +61,7 @@ ARTIFACT.RELATION = "cochange" # cochange, callgraph, mail, issue ## initialize project configuration proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, ARTIFACT) -proj.conf$update.value("artifact.filter.base", TRUE) +proj.conf$update.value("commits.filter.base.artifact", TRUE) # proj.conf$print() ## initialize network configuration @@ -85,7 +86,7 @@ x = NetworkBuilder$new(project.data = x.data, network.conf = net.conf) # x.data$get.synchronicity() # x.data$group.artifacts.by.data.column("commits", "author.name") # x.data$get.commits.filtered() -# x.data$get.commits.filtered.empty() +# x.data$get.commits.filtered.uncached(remove.untracked.files = TRUE, remove.base.artifact = FALSE) # x.data$get.mails() # x.data$get.authors() # x.data$get.data.path() @@ -126,7 +127,7 @@ y = NetworkBuilder$new(project.data = y.data, network.conf = net.conf) # y.data$get.synchronicity() # y.data$group.artifacts.by.data.column("commits", "author.name") # y.data$get.commits.filtered() -# y.data$get.commits.filtered.empty() +# y.data$get.commits.filtered.uncached(remove.untracked.files = TRUE, remove.base.artifact = FALSE) # y.data$get.mails() # y.data$get.authors() # y.data$get.data.path() diff --git a/tests/test-data-cut.R b/tests/test-data-cut.R index 36939dad..e235c616 100644 --- a/tests/test-data-cut.R +++ b/tests/test-data-cut.R @@ -16,6 +16,7 @@ ## Copyright 2018 by Claus Hunsen ## Copyright 2018 by Barbara Eckl ## Copyright 2018 by Thomas Bock +## Copyright 2018 by Jakob Kronawitter ## All Rights Reserved. @@ -44,26 +45,22 @@ test_that("Cut commit and mail data to same date range.", { x.data = ProjectData$new(proj.conf) - commit.data.expected = data.frame(commit.id = sprintf("", c(32712, 32712, 32713, 32713)), - date = get.date.from.string(c("2016-07-12 15:58:59", "2016-07-12 15:58:59", "2016-07-12 16:00:45", - "2016-07-12 16:00:45")), - author.name = c("Björn", "Björn", "Olaf", "Olaf"), - author.email = c("bjoern@example.org", "bjoern@example.org", "olaf@example.org", - "olaf@example.org"), - committer.date = get.date.from.string(c("2016-07-12 15:58:59", "2016-07-12 15:58:59", "2016-07-20 10:00:44", - "2016-07-20 10:00:44")), - committer.name = c("Björn", "Björn", "Björn", "Björn"), - committer.email = c("bjoern@example.org", "bjoern@example.org", "bjoern@example.org", "bjoern@example.org"), - hash = c("72c8dd25d3dd6d18f46e2b26a5f5b1e2e8dc28d0", "72c8dd25d3dd6d18f46e2b26a5f5b1e2e8dc28d0", - "5a5ec9675e98187e1e92561e1888aa6f04faa338", "5a5ec9675e98187e1e92561e1888aa6f04faa338"), - changed.files = as.integer(c(1, 1, 1, 1)), - added.lines = as.integer(c(1, 1, 1, 1)), - deleted.lines = as.integer(c(1, 1, 0, 0)), - diff.size = as.integer(c(2, 2, 1, 1)), - file = c("test.c", "test.c", "test.c", "test.c"), - artifact = c("A", "defined(A)", "A", "defined(A)"), - artifact.type = c("Feature", "FeatureExpression", "Feature", "FeatureExpression"), - artifact.diff.size = as.integer(c(1, 1, 1, 1))) + commit.data.expected = data.frame(commit.id = sprintf("", c(32712, 32713)), + date = get.date.from.string(c("2016-07-12 15:58:59", "2016-07-12 16:00:45")), + author.name = c("Björn", "Olaf"), + author.email = c("bjoern@example.org", "olaf@example.org"), + committer.date = get.date.from.string(c("2016-07-12 15:58:59", "2016-07-20 10:00:44")), + committer.name = c("Björn", "Björn"), + committer.email = c("bjoern@example.org", "bjoern@example.org"), + hash = c("72c8dd25d3dd6d18f46e2b26a5f5b1e2e8dc28d0", "5a5ec9675e98187e1e92561e1888aa6f04faa338"), + changed.files = as.integer(c(1, 1)), + added.lines = as.integer(c(1, 1)), + deleted.lines = as.integer(c(1, 0)), + diff.size = as.integer(c(2, 1)), + file = c("test.c", "test.c"), + artifact = c("A", "A"), + artifact.type = c("Feature", "Feature"), + artifact.diff.size = as.integer(c(1, 1))) mail.data.expected = data.frame(author.name = c("Thomas"), author.email = c("thomas@example.org"), diff --git a/tests/test-networks-artifact.R b/tests/test-networks-artifact.R index d2f1fa5a..bdf926f7 100644 --- a/tests/test-networks-artifact.R +++ b/tests/test-networks-artifact.R @@ -14,6 +14,7 @@ ## Copyright 2017-2018 by Christian Hechtl ## Copyright 2017 by Claus Hunsen ## Copyright 2018 by Barbara Eckl +## Copyright 2018 by Jakob Kronawitter ## All Rights Reserved. @@ -36,7 +37,7 @@ test_that("Network construction of the undirected artifact-cochange network", { ## configurations proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, ARTIFACT) - proj.conf$update.value("artifact.filter.base", FALSE) + proj.conf$update.value("commits.filter.base.artifact", FALSE) net.conf = NetworkConf$new() net.conf$update.values(updated.values = list(artifact.relation = "cochange")) diff --git a/tests/test-networks-author.R b/tests/test-networks-author.R index 488fe146..fa5ad3fb 100644 --- a/tests/test-networks-author.R +++ b/tests/test-networks-author.R @@ -16,6 +16,7 @@ ## Copyright 2017 by Felix Prasse ## Copyright 2018 by Barbara Eckl ## Copyright 2018 by Thomas Bock +## Copyright 2018 by Jakob Kronawitter ## All Rights Reserved. @@ -139,7 +140,7 @@ test_that("Amount of authors (author.all.authors, author.only.committers).", { ## configurations proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, ARTIFACT) - proj.conf$update.value("artifact.filter.base", FALSE) + proj.conf$update.value("commits.filter.base.artifact", FALSE) net.conf = NetworkConf$new() ## update network configuration @@ -198,7 +199,7 @@ test_that("Network construction of the undirected author-cochange network", { ## configurations proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, ARTIFACT) - proj.conf$update.value("artifact.filter.base", FALSE) + proj.conf$update.value("commits.filter.base.artifact", FALSE) net.conf = NetworkConf$new() net.conf$update.values(updated.values = list(author.relation = "cochange")) @@ -243,7 +244,7 @@ test_that("Network construction of the undirected but temorally ordered author-c ## configurations proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, ARTIFACT) - proj.conf$update.value("artifact.filter.base", FALSE) + proj.conf$update.value("commits.filter.base.artifact", FALSE) net.conf = NetworkConf$new() net.conf$update.values(updated.values = list(author.relation = "cochange", author.directed = FALSE, author.respect.temporal.order = TRUE)) @@ -285,7 +286,7 @@ test_that("Network construction of the directed author-cochange network", { ## configurations proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, ARTIFACT) - proj.conf$update.value("artifact.filter.base", FALSE) + proj.conf$update.value("commits.filter.base.artifact", FALSE) net.conf = NetworkConf$new() net.conf$update.values(updated.values = list(author.relation = "cochange", author.directed = TRUE)) @@ -326,7 +327,7 @@ test_that("Network construction of the directed author-cochange network without ## configurations proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, ARTIFACT) - proj.conf$update.value("artifact.filter.base", FALSE) + proj.conf$update.value("commits.filter.base.artifact", FALSE) net.conf = NetworkConf$new() net.conf$update.values(updated.values = list(author.relation = "cochange", author.directed = TRUE, author.respect.temporal.order = FALSE)) @@ -372,7 +373,7 @@ test_that("Network construction of the undirected simplified author-cochange net ## configurations proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, ARTIFACT) - proj.conf$update.value("artifact.filter.base", FALSE) + proj.conf$update.value("commits.filter.base.artifact", FALSE) net.conf = NetworkConf$new() net.conf$update.values(updated.values = list(author.relation = "cochange", simplify = TRUE)) @@ -420,7 +421,7 @@ test_that("Network construction of the undirected author-issue network with all ## configurations proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, ARTIFACT) - proj.conf$update.value("artifact.filter.base", FALSE) + proj.conf$update.value("commits.filter.base.artifact", FALSE) proj.conf$update.value("issues.only.comments", FALSE) net.conf = NetworkConf$new() net.conf$update.values(updated.values = list(author.relation = "issue")) @@ -511,7 +512,7 @@ test_that("Network construction of the undirected author-issue network with just ## configurations proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, ARTIFACT) - proj.conf$update.value("artifact.filter.base", FALSE) + proj.conf$update.value("commits.filter.base.artifact", FALSE) net.conf = NetworkConf$new() net.conf$update.values(updated.values = list(author.relation = "issue")) diff --git a/tests/test-networks-bipartite.R b/tests/test-networks-bipartite.R index 27305437..1c99f0d7 100644 --- a/tests/test-networks-bipartite.R +++ b/tests/test-networks-bipartite.R @@ -15,6 +15,7 @@ ## Copyright 2017-2018 by Claus Hunsen ## Copyright 2018 by Barbara Eckl ## Copyright 2018 by Thomas Bock +## Copyright 2018 by Jakob Kronawitter ## All Rights Reserved. @@ -37,7 +38,7 @@ test_that("Construction of the bipartite network for the feature artifact with a ## configurations proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, ARTIFACT) - proj.conf$update.value("artifact.filter.base", FALSE) + proj.conf$update.value("commits.filter.base.artifact", FALSE) net.conf = NetworkConf$new() net.conf$update.values(updated.values = list(author.relation = "cochange", artifact.relation = "cochange")) @@ -90,7 +91,7 @@ test_that("Construction of the bipartite network for the file artifact with auth ## configurations proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, "file") - proj.conf$update.value("artifact.filter.base", FALSE) + proj.conf$update.value("commits.filter.base.artifact", FALSE) net.conf = NetworkConf$new() net.conf$update.values(updated.values = list(author.relation = "cochange", artifact.relation = "cochange")) @@ -143,7 +144,7 @@ test_that("Construction of the bipartite network for the function artifact with ## configurations proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, "function") - proj.conf$update.value("artifact.filter.base", FALSE) + proj.conf$update.value("commits.filter.base.artifact", FALSE) net.conf = NetworkConf$new() net.conf$update.values(updated.values = list(author.relation = "cochange", artifact.relation = "cochange")) @@ -194,7 +195,7 @@ test_that("Construction of the bipartite network for the featureexpression artif ## configurations proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, "featureexpression") - proj.conf$update.value("artifact.filter.base", FALSE) + proj.conf$update.value("commits.filter.base.artifact", FALSE) net.conf = NetworkConf$new() net.conf$update.values(updated.values = list(author.relation = "cochange", artifact.relation = "cochange")) @@ -245,7 +246,7 @@ test_that("Construction of the bipartite network for the feature artifact with a ## configurations proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, ARTIFACT) - proj.conf$update.value("artifact.filter.base", FALSE) + proj.conf$update.value("commits.filter.base.artifact", FALSE) net.conf = NetworkConf$new() net.conf$update.values(updated.values = list(author.relation = "cochange", artifact.relation = "issue")) @@ -303,7 +304,7 @@ test_that("Construction of the directed bipartite network for the feature artifa ## configurations proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, ARTIFACT) - proj.conf$update.value("artifact.filter.base", FALSE) + proj.conf$update.value("commits.filter.base.artifact", FALSE) net.conf = NetworkConf$new() net.conf$update.values(updated.values = list(author.relation = "cochange", artifact.relation = "cochange", author.directed = TRUE)) @@ -356,7 +357,7 @@ test_that("Construction of the directed bipartite network for the file artifact ## configurations proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, "file") - proj.conf$update.value("artifact.filter.base", FALSE) + proj.conf$update.value("commits.filter.base.artifact", FALSE) net.conf = NetworkConf$new() net.conf$update.values(updated.values = list(author.relation = "cochange", artifact.relation = "cochange", author.directed = TRUE)) @@ -410,7 +411,7 @@ test_that("Construction of the directed bipartite network for the function artif ## configurations proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, "function") - proj.conf$update.value("artifact.filter.base", FALSE) + proj.conf$update.value("commits.filter.base.artifact", FALSE) net.conf = NetworkConf$new() net.conf$update.values(updated.values = list(author.relation = "cochange", artifact.relation = "cochange", author.directed = TRUE)) @@ -463,7 +464,7 @@ test_that("Construction of the directed bipartite network for the featureexpress ## configurations proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, "featureexpression") - proj.conf$update.value("artifact.filter.base", FALSE) + proj.conf$update.value("commits.filter.base.artifact", FALSE) net.conf = NetworkConf$new() net.conf$update.values(updated.values = list(author.relation = "cochange", artifact.relation = "cochange", author.directed = TRUE)) diff --git a/tests/test-networks-covariates.R b/tests/test-networks-covariates.R index 0f208019..b6c504db 100644 --- a/tests/test-networks-covariates.R +++ b/tests/test-networks-covariates.R @@ -16,6 +16,7 @@ ## Copyright 2017-2018 by Claus Hunsen ## Copyright 2018 by Thomas Bock ## Copyright 2018 by Klara Schlüter +## Copyright 2018 by Jakob Kronawitter ## All Rights Reserved. @@ -52,7 +53,7 @@ get.network.covariates.test.networks = function(network.type = c("author", "arti ## configuration and data objects proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, ARTIFACT) - proj.conf$update.value("artifact.filter.base", FALSE) + proj.conf$update.value("commits.filter.base.artifact", FALSE) proj.conf$update.value("issues.only.comments", FALSE) net.conf = NetworkConf$new() net.conf$update.values(list(author.relation = "cochange", simplify = FALSE)) diff --git a/tests/test-networks-cut.R b/tests/test-networks-cut.R index ac1eab9e..c09e391b 100644 --- a/tests/test-networks-cut.R +++ b/tests/test-networks-cut.R @@ -14,6 +14,7 @@ ## Copyright 2017 by Christian Hechtl ## Copyright 2018 by Claus Hunsen ## Copyright 2018 by Thomas Bock +## Copyright 2018 by Jakob Kronawitter ## All Rights Reserved. @@ -44,26 +45,22 @@ test_that("Cut commit and mail data to same date range.", { x.data = ProjectData$new(proj.conf) x = NetworkBuilder$new(x.data, net.conf) - commit.data.expected = data.frame(commit.id = sprintf("", c(32712, 32712, 32713, 32713)), - date = get.date.from.string(c("2016-07-12 15:58:59", "2016-07-12 15:58:59", "2016-07-12 16:00:45", - "2016-07-12 16:00:45")), - author.name = c("Björn", "Björn", "Olaf", "Olaf"), - author.email = c("bjoern@example.org", "bjoern@example.org", "olaf@example.org", - "olaf@example.org"), - committer.date = get.date.from.string(c("2016-07-12 15:58:59", "2016-07-12 15:58:59", "2016-07-20 10:00:44", - "2016-07-20 10:00:44")), - committer.name = c("Björn", "Björn", "Björn", "Björn"), - committer.email = c("bjoern@example.org", "bjoern@example.org", "bjoern@example.org", "bjoern@example.org"), - hash = c("72c8dd25d3dd6d18f46e2b26a5f5b1e2e8dc28d0", "72c8dd25d3dd6d18f46e2b26a5f5b1e2e8dc28d0", - "5a5ec9675e98187e1e92561e1888aa6f04faa338", "5a5ec9675e98187e1e92561e1888aa6f04faa338"), - changed.files = as.integer(c(1, 1, 1, 1)), - added.lines = as.integer(c(1, 1, 1, 1)), - deleted.lines = as.integer(c(1, 1, 0, 0)), - diff.size = as.integer(c(2, 2, 1, 1)), - file = c("test.c", "test.c", "test.c", "test.c"), - artifact = c("A", "defined(A)", "A", "defined(A)"), - artifact.type = c("Feature", "FeatureExpression", "Feature", "FeatureExpression"), - artifact.diff.size = as.integer(c(1, 1, 1, 1))) + commit.data.expected = data.frame(commit.id = sprintf("", c(32712, 32713)), + date = get.date.from.string(c("2016-07-12 15:58:59", "2016-07-12 16:00:45")), + author.name = c("Björn", "Olaf"), + author.email = c("bjoern@example.org", "olaf@example.org"), + committer.date = get.date.from.string(c("2016-07-12 15:58:59", "2016-07-20 10:00:44")), + committer.name = c("Björn", "Björn"), + committer.email = c("bjoern@example.org", "bjoern@example.org"), + hash = c("72c8dd25d3dd6d18f46e2b26a5f5b1e2e8dc28d0", "5a5ec9675e98187e1e92561e1888aa6f04faa338"), + changed.files = as.integer(c(1, 1)), + added.lines = as.integer(c(1, 1)), + deleted.lines = as.integer(c(1, 0)), + diff.size = as.integer(c(2, 1)), + file = c("test.c", "test.c"), + artifact = c("A", "A"), + artifact.type = c("Feature", "Feature"), + artifact.diff.size = as.integer(c(1, 1))) mail.data.expected = data.frame(author.name = c("Thomas"), author.email = c("thomas@example.org"), diff --git a/tests/test-networks-multi-relation.R b/tests/test-networks-multi-relation.R index 47b3a285..73fd468a 100644 --- a/tests/test-networks-multi-relation.R +++ b/tests/test-networks-multi-relation.R @@ -35,7 +35,7 @@ test_that("Network construction of the undirected author network with relation = ## configurations proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, ARTIFACT) - proj.conf$update.value("artifact.filter.base", FALSE) + proj.conf$update.value("commits.filter.base.artifact", FALSE) net.conf = NetworkConf$new() net.conf$update.values(updated.values = list(author.relation = c("cochange", "mail"))) @@ -97,7 +97,7 @@ test_that("Construction of the bipartite network for the feature artifact with a ## configurations proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, ARTIFACT) - proj.conf$update.value("artifact.filter.base", FALSE) + proj.conf$update.value("commits.filter.base.artifact", FALSE) net.conf = NetworkConf$new() net.conf$update.values(updated.values = list(author.relation = c("cochange", "issue"), artifact.relation = c("issue", "mail"))) @@ -192,7 +192,7 @@ test_that("Construction of the multi network for the feature artifact with autho ## configurations proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, ARTIFACT) - proj.conf$update.value("artifact.filter.base", FALSE) + proj.conf$update.value("commits.filter.base.artifact", FALSE) net.conf = NetworkConf$new() net.conf$update.values(updated.values = list(author.relation = c("cochange", "mail"), artifact.relation = c("cochange", "issue"))) diff --git a/tests/test-networks-multi.R b/tests/test-networks-multi.R index fcdcd9df..8d3db249 100644 --- a/tests/test-networks-multi.R +++ b/tests/test-networks-multi.R @@ -36,7 +36,7 @@ test_that("Construction of the multi network for the feature artifact with autho ## configurations proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, ARTIFACT) - proj.conf$update.value("artifact.filter.base", FALSE) + proj.conf$update.value("commits.filter.base.artifact", FALSE) net.conf = NetworkConf$new() net.conf$update.values(updated.values = list(author.relation = "cochange", artifact.relation = "cochange")) diff --git a/tests/test-networks.R b/tests/test-networks.R index 5a9a8489..56942880 100644 --- a/tests/test-networks.R +++ b/tests/test-networks.R @@ -93,3 +93,57 @@ test_that("Merge networks", { }) + +## / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / +## Construction of edgeless networks --------------------------------------- + +test_that("Construction of edgeless networks", { + + ## create data structures and network configuration as a basis + edge.list = data.frame(from = c("D1", "D2"), to = c("D2", "D1")) + edge.list.as.sequence = as.vector(as.matrix(edge.list)) + vertices = data.frame(name = c("D1", "D2")) + vertices.as.sequence = vertices[["name"]] + directed = FALSE # directedness does not matter for this test, but should be consistent + net.conf = NetworkConf$new() + + ## construct edgeless network + net.edgeless = igraph::graph.empty(n = 0, directed = directed) + + igraph::vertices(vertices.as.sequence) + + igraph::edges(NULL, weight = 1) + + ## + ## normal network + ## + + net.constructed = construct.network.from.edge.list(vertices, edge.list, net.conf) + net.expected = igraph::graph.empty(n = 0, directed = directed) + + igraph::vertices(vertices.as.sequence) + + igraph::edges(edge.list.as.sequence, weight = 1) + + ## check equality + expect_true(igraph::identical_graphs(net.constructed, net.expected), label = "normal network construction") + + ## + ## edgeless network: NULL + ## + + net.constructed = construct.network.from.edge.list(vertices, NULL, net.conf) + expect_true(igraph::identical_graphs(net.constructed, net.edgeless), label = "edgeless network: NULL") + + ## + ## edgeless network: create.empty.edge.list() + ## + + net.constructed = construct.network.from.edge.list(vertices, create.empty.edge.list(), net.conf) + expect_true(igraph::identical_graphs(net.constructed, net.edgeless), label = "edgeless network: create.empty.edge.list()") + + ## + ## edgeless network: empty data.frame + ## + + net.constructed = construct.network.from.edge.list(vertices, data.frame(), net.conf) + expect_true(igraph::identical_graphs(net.constructed, net.edgeless), label = "edgeless network: empty data.frame") + +}) + diff --git a/tests/test-read.R b/tests/test-read.R index ee1a8d21..88a91bc9 100644 --- a/tests/test-read.R +++ b/tests/test-read.R @@ -15,6 +15,7 @@ ## Copyright 2017 by Felix Prasse ## Copyright 2018 by Claus Hunsen ## Copyright 2018 by Thomas Bock +## Copyright 2018 by Jakob Kronawitter ## All Rights Reserved. @@ -88,7 +89,7 @@ test_that("Read the raw commit data with the file artifact.", { proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, "file") ## read the actual data - commit.data.read = read.commits.raw(proj.conf$get.value("datapath"), proj.conf$get.value("artifact")) + commit.data.read = read.commits(proj.conf$get.value("datapath"), proj.conf$get.value("artifact")) ## build the expected data.frame commit.data.expected = data.frame(commit.id = sprintf("", c(32716, 32717, 32718, 32719, 32715)), diff --git a/tests/test-split.R b/tests/test-split.R index 7f624c98..94baa3cd 100644 --- a/tests/test-split.R +++ b/tests/test-split.R @@ -15,6 +15,7 @@ ## Copyright 2017 by Felix Prasse ## Copyright 2018 by Thomas Bock ## Copyright 2018 by Christian Hechtl +## Copyright 2018 by Jakob Kronawitter ## All Rights Reserved. @@ -93,9 +94,9 @@ test_that("Split a data object time-based (split.basis == 'commits').", { ## check data for all ranges expected.data = list( commits = list( - "2016-07-12 15:58:59-2016-07-12 16:01:59" = data$commits[1:4, ], - "2016-07-12 16:01:59-2016-07-12 16:04:59" = data.frame(), - "2016-07-12 16:04:59-2016-07-12 16:06:33" = data$commits[5:9, ] + "2016-07-12 15:58:59-2016-07-12 16:01:59" = data$commits[1:2, ], + "2016-07-12 16:01:59-2016-07-12 16:04:59" = data$commits[0, ], + "2016-07-12 16:04:59-2016-07-12 16:06:33" = data$commits[3:6, ] ), mails = list( "2016-07-12 15:58:59-2016-07-12 16:01:59" = data.frame(), @@ -168,10 +169,10 @@ test_that("Split a data object time-based (split.basis == 'mails').", { ## check data for all ranges expected.data = list( commits = list( - "2004-10-09 18:38:13-2007-10-10 12:38:13" = data.frame(), - "2007-10-10 12:38:13-2010-10-10 06:38:13" = data.frame(), - "2010-10-10 06:38:13-2013-10-10 00:38:13" = data.frame(), - "2013-10-10 00:38:13-2016-07-12 16:05:38" = data$commits[1:4, ] + "2004-10-09 18:38:13-2007-10-10 12:38:13" = data$commits[0, ], + "2007-10-10 12:38:13-2010-10-10 06:38:13" = data$commits[0, ], + "2010-10-10 06:38:13-2013-10-10 00:38:13" = data$commits[0, ], + "2013-10-10 00:38:13-2016-07-12 16:05:38" = data$commits[1:2, ] ), mails = list( "2004-10-09 18:38:13-2007-10-10 12:38:13" = data$mails[rownames(data$mails) %in% 1:2, ], @@ -247,9 +248,9 @@ test_that("Split a data object time-based (split.basis == 'issues').", { ## check data for all ranges expected.data = list( commits = list( - "2013-04-21 23:52:09-2015-04-22 11:52:09" = data.frame(), + "2013-04-21 23:52:09-2015-04-22 11:52:09" = data$commits[0, ], "2015-04-22 11:52:09-2017-04-21 23:52:09" = data$commits, - "2017-04-21 23:52:09-2017-05-23 12:32:40" = data.frame() + "2017-04-21 23:52:09-2017-05-23 12:32:40" = data$commits[0, ] ), mails = list( "2013-04-21 23:52:09-2015-04-22 11:52:09" = data.frame(), @@ -355,7 +356,7 @@ test_that("Split a data object time-based (bins == ... ).", { test_that("Test splitting data by networks", { ## configuration and data objects proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, ARTIFACT) - proj.conf$update.value("artifact.filter.base", FALSE) + proj.conf$update.value("commits.filter.base.artifact", FALSE) net.conf = NetworkConf$new() net.conf$update.values(list(author.relation = "cochange", simplify = FALSE)) @@ -421,7 +422,7 @@ test_that("Test splitting data by networks", { test_that("Test splitting data by ranges", { ## configuration and data objects proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, ARTIFACT) - proj.conf$update.value("artifact.filter.base", FALSE) + proj.conf$update.value("commits.filter.base.artifact", FALSE) net.conf = NetworkConf$new() net.conf$update.values(list(author.relation = "cochange", simplify = FALSE)) @@ -496,9 +497,9 @@ test_that("Split a data object activity-based (activity.type = 'commits').", { ## check data for all ranges expected.data = list( commits = list( - "2016-07-12 15:58:59-2016-07-12 16:05:41" = data$commits[1:4, ], - "2016-07-12 16:05:41-2016-07-12 16:06:32" = data$commits[5:7, ], - "2016-07-12 16:06:32-2016-07-12 16:06:33" = data$commits[8:9, ] + "2016-07-12 15:58:59-2016-07-12 16:05:41" = data$commits[1:2, ], + "2016-07-12 16:05:41-2016-07-12 16:06:32" = data$commits[3:4, ], + "2016-07-12 16:06:32-2016-07-12 16:06:33" = data$commits[5:6, ] ), mails = list( "2016-07-12 15:58:59-2016-07-12 16:05:41" = data$mails[rownames(data$mails) %in% 16:17, ], @@ -591,8 +592,8 @@ test_that("Split a data object activity-based (activity.type = 'commits').", { ## check data for all ranges expected.data = list( commits = list( - "2016-07-12 15:58:59-2016-07-12 16:06:10" = data$commits[1:6, ], - "2016-07-12 16:06:10-2016-07-12 16:06:33" = data$commits[7:9, ] + "2016-07-12 15:58:59-2016-07-12 16:06:10" = data$commits[1:3, ], + "2016-07-12 16:06:10-2016-07-12 16:06:33" = data$commits[4:6, ] ), mails = list( "2016-07-12 15:58:59-2016-07-12 16:06:10" = data$mails[rownames(data$mails) %in% 16:17, ], @@ -675,12 +676,12 @@ test_that("Split a data object activity-based (activity.type = 'mails').", { ## check data for all ranges expected.data = list( commits = list( - "2004-10-09 18:38:13-2010-07-12 11:05:35" = data.frame(), - "2010-07-12 11:05:35-2010-07-12 12:05:41" = data.frame(), - "2010-07-12 12:05:41-2010-07-12 12:05:44" = data.frame(), - "2010-07-12 12:05:44-2016-07-12 15:58:40" = data.frame(), - "2016-07-12 15:58:40-2016-07-12 16:05:37" = data$commits[1:4, ], - "2016-07-12 16:05:37-2016-07-12 16:05:38" = data.frame() + "2004-10-09 18:38:13-2010-07-12 11:05:35" = data$commits[0, ], + "2010-07-12 11:05:35-2010-07-12 12:05:41" = data$commits[0, ], + "2010-07-12 12:05:41-2010-07-12 12:05:44" = data$commits[0, ], + "2010-07-12 12:05:44-2016-07-12 15:58:40" = data$commits[0, ], + "2016-07-12 15:58:40-2016-07-12 16:05:37" = data$commits[1:2, ], + "2016-07-12 16:05:37-2016-07-12 16:05:38" = data$commits[0, ] ), mails = list( "2004-10-09 18:38:13-2010-07-12 11:05:35" = data$mails[rownames(data$mails) %in% 1:3, ], @@ -742,7 +743,7 @@ test_that("Split a data object activity-based (activity.type = 'mails').", { ## check data for all ranges expected.data = list( commits = list( - "2004-10-09 18:38:13-2016-07-12 16:05:38" = data$commits[1:4, ] + "2004-10-09 18:38:13-2016-07-12 16:05:38" = data$commits[1:2, ] ), mails = list( "2004-10-09 18:38:13-2016-07-12 16:05:38" = data$mails @@ -785,8 +786,8 @@ test_that("Split a data object activity-based (activity.type = 'mails').", { ## check data for all ranges expected.data = list( commits = list( - "2004-10-09 18:38:13-2010-07-12 12:05:43" = data.frame(), - "2010-07-12 12:05:43-2016-07-12 16:05:38" = data$commits[1:4, ] + "2004-10-09 18:38:13-2010-07-12 12:05:43" = data$commits[0, ], + "2010-07-12 12:05:43-2016-07-12 16:05:38" = data$commits[1:2, ] ), mails = list( "2004-10-09 18:38:13-2010-07-12 12:05:43" = data$mails[rownames(data$mails) %in% 1:8, ], @@ -866,10 +867,10 @@ test_that("Split a data object activity-based (activity.type = 'issues').", { ## check data for all ranges expected.data = list( commits = list( - "2013-04-21 23:52:09-2016-07-12 16:05:47" = data$commits[1:6, ], - "2016-07-12 16:05:47-2016-08-31 18:21:48" = data$commits[7:9, ], - "2016-08-31 18:21:48-2017-02-20 22:25:41" = data.frame(), - "2017-02-20 22:25:41-2017-05-23 12:32:40" = data.frame() + "2013-04-21 23:52:09-2016-07-12 16:05:47" = data$commits[1:3, ], + "2016-07-12 16:05:47-2016-08-31 18:21:48" = data$commits[4:6, ], + "2016-08-31 18:21:48-2017-02-20 22:25:41" = data$commits[0, ], + "2017-02-20 22:25:41-2017-05-23 12:32:40" = data$commits[0, ] ), mails = list( "2013-04-21 23:52:09-2016-07-12 16:05:47" = data$mails[rownames(data$mails) %in% 14:17, ], @@ -967,7 +968,7 @@ test_that("Split a data object activity-based (activity.type = 'issues').", { expected.data = list( commits = list( "2013-04-21 23:52:09-2016-07-27 22:25:25" = data$commits, - "2016-07-27 22:25:25-2017-05-23 12:32:40" = data.frame() + "2016-07-27 22:25:25-2017-05-23 12:32:40" = data$commits[0, ] ), mails = list( "2013-04-21 23:52:09-2016-07-27 22:25:25" = data$mails[rownames(data$mails) %in% 14:17, ], @@ -1027,7 +1028,7 @@ test_that("Split a network time-based (time.period = ...).", { ## configuration and data objects proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, ARTIFACT) - proj.conf$update.value("artifact.filter.base", FALSE) + proj.conf$update.value("commits.filter.base.artifact", FALSE) net.conf = NetworkConf$new() net.conf$update.values(list(author.relation = "cochange", simplify = FALSE)) project.data = ProjectData$new(proj.conf) @@ -1083,7 +1084,7 @@ test_that("Split a list of networks time-based.", { ## configuration and data objects proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, ARTIFACT) - proj.conf$update.value("artifact.filter.base", FALSE) + proj.conf$update.value("commits.filter.base.artifact", FALSE) net.conf = NetworkConf$new() net.conf$update.values(list(simplify = FALSE, author.directed = TRUE)) project.data = ProjectData$new(proj.conf) @@ -1130,7 +1131,7 @@ test_that("Split a network time-based (bins = ...).", { ## configuration and data objects proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, ARTIFACT) - proj.conf$update.value("artifact.filter.base", FALSE) + proj.conf$update.value("commits.filter.base.artifact", FALSE) net.conf = NetworkConf$new() net.conf$update.values(list(author.relation = "cochange", simplify = FALSE)) project.data = ProjectData$new(proj.conf) @@ -1192,7 +1193,7 @@ test_that("Test splitting network by ranges", { ## configuration and data objects proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, ARTIFACT) - proj.conf$update.value("artifact.filter.base", FALSE) + proj.conf$update.value("commits.filter.base.artifact", FALSE) net.conf = NetworkConf$new() net.conf$update.values(list(author.relation = "cochange", simplify = FALSE)) project.data = ProjectData$new(proj.conf) @@ -1223,7 +1224,7 @@ test_that("Split a network activity-based (number.edges, number.windows).", { ## configuration and data objects proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, ARTIFACT) - proj.conf$update.value("artifact.filter.base", FALSE) + proj.conf$update.value("commits.filter.base.artifact", FALSE) net.conf = NetworkConf$new() net.conf$update.values(list(author.relation = "cochange", simplify = FALSE)) project.data = ProjectData$new(proj.conf) @@ -1517,7 +1518,7 @@ test_that("Check consistency of data and network time-based splitting.", { ## configuration and data objects proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, ARTIFACT) - proj.conf$update.value("artifact.filter.base", FALSE) + proj.conf$update.value("commits.filter.base.artifact", FALSE) net.conf = NetworkConf$new() net.conf$update.values(list(author.relation = "cochange", simplify = FALSE)) diff --git a/util-conf.R b/util-conf.R index 38a7b66b..e58fdaf1 100644 --- a/util-conf.R +++ b/util-conf.R @@ -17,6 +17,7 @@ ## Copyright 2017 by Felix Prasse ## Copyright 2017-2018 by Thomas Bock ## Copyright 2018 by Barbara Eckl +## Copyright 2018 by Jakob Kronawitter ## All Rights Reserved. @@ -75,7 +76,7 @@ Conf = R6::R6Class("Conf", return(self$get.value(att)) }) names(current.values) = names(private$attributes) - self$update.values(current.values, stop.on.error = TRUE) + self$update.values(current.values) }, #' Check whether the given 'value' is the correct datatype @@ -124,8 +125,8 @@ Conf = R6::R6Class("Conf", #' The constructor, automatically checking the default values. initialize = function() { - ## FIXME do we need this? - private$check.values() + # ## check the default values for validity + # private$check.values() }, ## * * printing ---------------------------------------------------- @@ -165,8 +166,7 @@ Conf = R6::R6Class("Conf", #' #' @param entry the entry name for the value #' @param value the new value - #' @param error call stop() on an error? [default: FALSE] - update.value = function(entry, value, stop.on.error = FALSE) { + update.value = function(entry, value) { ## construct list for updating updating = list(value) names(updating) = entry @@ -178,10 +178,9 @@ Conf = R6::R6Class("Conf", #' 'updated.values' list. #' #' @param updated.values the new values for the attributes to be updated - #' @param error call stop() on an error? [default: FALSE] - update.values = function(updated.values = list(), stop.on.error = FALSE) { + update.values = function(updated.values = list()) { ## determine the function executed on an error - error.function = ifelse(stop.on.error, stop, logging::logwarn) + error.function = stop ## check values to update names.to.update = c() @@ -218,9 +217,7 @@ Conf = R6::R6Class("Conf", } else { message = paste0( - "Updating network-configuration attribute '%s' failed.", - if (!stop.on.error) " The failure is ignored!\n", - # "Current value: %s\n", + "Updating network-configuration attribute '%s' failed.\n", "Allowed values (%s of type '%s'): %s\n", "Given value (of type '%s'): %s" ) @@ -332,7 +329,13 @@ ProjectConf = R6::R6Class("ProjectConf", inherit = Conf, ## * * attributes --------------------------------------------------- attributes = list( - artifact.filter.base = list( + commits.filter.base.artifact = list( + default = TRUE, + type = "logical", + allowed = c(TRUE, FALSE), + allowed.number = 1 + ), + commits.filter.untracked.files = list( default = TRUE, type = "logical", allowed = c(TRUE, FALSE), @@ -469,6 +472,10 @@ ProjectConf = R6::R6Class("ProjectConf", inherit = Conf, #' and \code{featureexpression}) [default: "feature"] initialize = function(data, selection.process, casestudy, artifact = c("feature", "file", "function", "featureexpression")) { + + logging::loginfo("Construct project configuration: starting.") + + ## call super constructor super$initialize() ## verify arguments using match.arg @@ -479,8 +486,6 @@ ProjectConf = R6::R6Class("ProjectConf", inherit = Conf, private$casestudy = verify.argument.for.parameter(casestudy, "character", class(self)[1]) private$artifact = verify.argument.for.parameter(artifact, "character", class(self)[1]) - logging::loginfo("Construct project configuration: starting.") - ## convert artifact to tagging tagging = ARTIFACT.TO.TAGGING[[ artifact ]] if (is.null(tagging)) { @@ -691,6 +696,12 @@ NetworkConf = R6::R6Class("NetworkConf", inherit = Conf, allowed = c(TRUE, FALSE), allowed.number = 1 ), + edges.for.base.artifacts = list( + default = TRUE, + type = "logical", + allowed = c(TRUE, FALSE), + allowed.number = 1 + ), edge.attributes = list( default = c( "date", "artifact.type", # general @@ -749,16 +760,20 @@ NetworkConf = R6::R6Class("NetworkConf", inherit = Conf, #' The constructor, automatically checking the default values. initialize = function() { - # private$check.values() + logging::loginfo("Construct network configuration: starting.") + + ## call super constructor + super$initialize() + + logging::loginfo("Construct network configuration: finished.") }, #' Update the attributes of the class with the new values given in the #' 'updated.values' list. #' #' @param updated.values the new values for the attributes to be updated - #' @param error call stop() on an error? [default: FALSE] - update.values = function(updated.values = list(), stop.on.error = FALSE) { - super$update.values(updated.values = updated.values, stop.on.error = stop.on.error) + update.values = function(updated.values = list()) { + super$update.values(updated.values = updated.values) ## 1) "date" and "artifact.type" always as edge attribute name = "edge.attributes" diff --git a/util-data.R b/util-data.R index 80cd16eb..14e47597 100644 --- a/util-data.R +++ b/util-data.R @@ -17,6 +17,7 @@ ## Copyright 2017-2018 by Christian Hechtl ## Copyright 2017 by Felix Prasse ## Copyright 2017 by Ferdinand Frank +## Copyright 2018 by Jakob Kronawitter ## All Rights Reserved. @@ -31,16 +32,24 @@ requireNamespace("parallel") # for parallel computation ## / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / ## Constants --------------------------------------------------------------- -## base artifacts +## untracked file +UNTRACKED.FILE = "" + +## the empty string which resides in the artifact column when artifact == feature or artifact == function +## in the 'ProjectConf' +UNTRACKED.FILE.EMPTY.ARTIFACT = "" + +## base artifacts (which one actually applies, depends on the artifact parameter in the 'ProjectConf') BASE.ARTIFACTS = c( - "Base_Feature", - "File_Level" + "Base_Feature", ## when artifact == feature + "File_Level", ## when artifact == function + UNTRACKED.FILE ## when artifact == file ) -## mapping of data source to artifact column -## (for commits: filter also empty, non-configured, and (potentially) base artifacts) +## mapping of data source to artifact column (for commits: filter artifacts based on the configuration options +## 'commits.filter.base.artifact' and 'commits.filter.untracked.files' of the corresponding 'ProjectConf' object) DATASOURCE.TO.ARTIFACT.FUNCTION = list( - "commits" = "get.commits.filtered.empty", + "commits" = "get.commits.filtered", "mails" = "get.mails", "issues" = "get.issues" ) @@ -70,7 +79,6 @@ ProjectData = R6::R6Class("ProjectData", ## commits and commit data commits.filtered = NULL, # data.frame - commits.filtered.empty = NULL, #data.frame commits = NULL, # data.frame synchronicity = NULL, # data.frame pasta = NULL, # data.frame @@ -85,92 +93,31 @@ ProjectData = R6::R6Class("ProjectData", ## * * filtering commits ------------------------------------------- - #' Filter commits with empty artifacts from the already filtered commit list and - #' save the new list to 'commits.filtered.empty'. + #' Filter commits retrieved by the method \code{get.commits} after potentially removing untracked files and the + #' base artifact (see parameters). #' - #' @seealso \code{get.commits.filtered} - filter.commits.empty = function() { - - logging::logdebug("filter.commits.empty: starting.") - - ## do not compute anything more than once - if (!is.null(private$commits.filtered.empty)) { - logging::logdebug("filter.commits.empty: finished. (already existing)") - return(private$commits.filtered.empty) - } - - ## get raw commit data - commit.data = self$get.commits.filtered() - - ## break if the list of commits is empty - if (nrow(commit.data) == 0) { - logging::logwarn("There are no commits available for the current environment.") - logging::logwarn("Class: %s", self$get.class.name()) - # logging::logwarn("Configuration: %s", private$project.conf$get.conf.as.string()) - private$commits.filtered.empty = data.frame() - return(private$commits.filtered.empty) - } - - ## only process commits with non-empty artifact - commit.data = subset(commit.data, artifact != "") - - ## store the commit data - private$commits.filtered.empty = commit.data - logging::logdebug("filter.commits.empty: finished.") - }, - - #' Filter the data from the commit list which does not belong to the artifact listed in the field - #' \code{project.conf}. - #' If configured in \code{project.conf}, filter the commits from the commit list that touch the base artifact. - #' Add synchronicity and PaStA data if configured in \code{project.conf}. - #' Finally, save the new list to the field \code{commits.filtered}. - filter.commits = function() { - + #' @param remove.untracked.files flag whether untracked files are kept or removed + #' @param remove.base.artifact flag whether the base artifact is kept or removed + #' + #' @return the commits retrieved by the method \code{get.commits} after all filters have been applied + filter.commits = function(remove.untracked.files, remove.base.artifact) { logging::logdebug("filter.commits: starting.") - ## do not compute anything more than once - if (!is.null(private$commits.filtered)) { - logging::logdebug("filter.commits: finished. (already existing)") - return(private$commits.filtered) - } - - ## get raw commit data + ## get commit data commit.data = self$get.commits() - ## break if the list of commits is empty - if (nrow(commit.data) == 0) { - logging::logwarn("There are no commits available for the current environment.") - logging::logwarn("Class: %s", self$get.class.name()) - # logging::logwarn("Configuration: %s", private$project.conf$get.conf.as.string()) - private$commits.filtered = data.frame() - return(private$commits.filtered) + ## filter out the untracked files + if (remove.untracked.files) { + commit.data = subset(commit.data, file != UNTRACKED.FILE) } - ## only process commits with the artifact listed in the configuration or missing - commit.data = subset(commit.data, artifact.type %in% - c(private$project.conf$get.value("artifact.codeface"), "")) - ## filter out the base artifacts (i.e., Base_Feature, File_Level) - if (private$project.conf$get.value("artifact.filter.base")) { + if (remove.base.artifact) { commit.data = subset(commit.data, !(artifact %in% BASE.ARTIFACTS)) } - ## append synchronicity data if wanted - if (private$project.conf$get.value("synchronicity")) { - synchronicity.data = self$get.synchronicity() - commit.data = merge(commit.data, synchronicity.data, - by = "hash", all.x = TRUE, sort = FALSE) - } - - ## add PaStA data if wanted - if (private$project.conf$get.value("pasta")) { - self$get.pasta() - commit.data = private$add.pasta.data(commit.data) - } - - ## store the commit data - private$commits.filtered = commit.data logging::logdebug("filter.commits: finished.") + return(commit.data) }, ## * * PaStA data -------------------------------------------------- @@ -287,7 +234,6 @@ ProjectData = R6::R6Class("ProjectData", #' changed. reset.environment = function() { private$commits.filtered = NULL - private$commits.filtered.empty = NULL private$commits = NULL private$synchronicity = NULL private$mails = NULL @@ -328,6 +274,7 @@ ProjectData = R6::R6Class("ProjectData", #' Set a value of the project configuration and reset the environment set.project.conf.entry = function(entry, value) { private$project.conf$update.value(entry, value) + self$reset.environment() }, #' Update the project configuration based on the given list @@ -384,108 +331,98 @@ ProjectData = R6::R6Class("ProjectData", ## * * raw data ---------------------------------------------------- - #' Get the list of commits without empty artifacts and filtered by the artifact kind configured - #' in the field \code{project.conf}. - #' If configured in \code{project.conf}, get the list of commits without the base artifact. - #' In addition, if configured in \code{project.conf}, append the synchronicity data and PaStA data - #' to the filtered commit data. - #' If the list of filtered commits does not already exist, call the filter method. + #' Return the commits retrieved by the method \code{get.commits} by removing untracked files and removing the + #' base artifact (if configured in the \code{project.conf}, see parameters \code{commits.filter.untracked.files} + #' and \code{commits.filter.base.artifact}). #' - #' @return the commit list without empty artifacts and containing only commit data related to the - #' configured artifact and, if configured, without the base artifact - get.commits.filtered.empty = function() { - logging::loginfo("Getting commit data filtered by artifact.base and artifact.empty.") - - ## if commits are not read already, do this - if (is.null(private$commits.filtered.empty)) { - private$filter.commits.empty() - } - - return(private$commits.filtered.empty) - }, - - #' Get the list of commits filtered by the artifact kind configured in the field \code{project.conf}. - #' If configured in \code{project.conf}, get the list of commits without the base artifact. - #' In addition, if configured in \code{project.conf}, append the synchronicity data and PaStA data - #' to the filtered commit data. - #' If the list of filtered commits does not already exist, call the filter method. + #' This method caches the filtered commits to the field \code{commits.filtered}. + #' + #' @return the commits retrieved by the method \code{get.commits} after all filters have been applied #' - #' @return the commit list containing only commit data related to the configured artifact and, - #' if configured, without the base artifact + #' @seealso get.commits.filtered.uncached get.commits.filtered = function() { - logging::loginfo("Getting commit data filtered by artifact.base.") - - ## if commits are not read already, do this if (is.null(private$commits.filtered)) { - private$filter.commits() + private$commits.filtered = private$filter.commits( + private$project.conf$get.value("commits.filter.untracked.files"), + private$project.conf$get.value("commits.filter.base.artifact") + ) } - return(private$commits.filtered) }, - #' Get the complete list of commits. - #' If configured in the field \code{project.conf}, append the PaStA data to the commit data - #' by calling the setter function. - #' If the list of commits does not already exist, call the read method first. + #' Return the commits retrieved by the method \code{get.commits} by removing untracked files and removing the + #' base artifact (see parameters). + #' + #' This method does not use caching. If you want to use caching, please use the method + #' \code{get.commits.filtered} instead. + #' + #' @param remove.untracked.files flag whether untracked files are kept or removed + #' @param remove.base.artifact flag whether the base artifact is kept or removed + #' + #' @return the commits retrieved by the method \code{get.commits} after all filters have been applied + #' + #' @seealso get.commits.filtered + get.commits.filtered.uncached = function(remove.untracked.files, remove.base.artifact) { + return (private$filter.commits(remove.untracked.files, remove.base.artifact)) + }, + + #' Get the list of commits which have the artifact kind configured in the \code{project.conf}. + #' If the list of commits is not cached in the field \code{commits}, call the read method first. + #' If configured in the \code{project.conf}, add PaStA and synchronicity data. #' #' @return the list of commits get.commits = function() { - logging::loginfo("Getting raw commit data.") + logging::loginfo("Getting commit data.") ## if commits are not read already, do this if (is.null(private$commits)) { - commits.read = read.commits( - self$get.data.path(), - private$project.conf$get.value("artifact") - ) + commit.data = read.commits(self$get.data.path(), private$project.conf$get.value("artifact")) - self$set.commits(data = commits.read) + ## only consider commits that have the artifact type configured in the 'project.conf' or commits to + ## untracked files + commit.data = subset(commit.data, artifact.type %in% + c(private$project.conf$get.value("artifact.codeface"), + UNTRACKED.FILE.EMPTY.ARTIFACT)) + + ## Add PaStA and synchronicity data (if configured in the 'project.conf') and save the commit data to + ## the field 'commits' afterwards + self$set.commits(commit.data) } private$extract.timestamps(source = "commits") return(private$commits) }, - #' Get the complete list of commits. - #' If it does not already exist, call the read method first. - #' - #' Note: This is just a delegate for \code{ProjectData$get.commits()}. - #' - #' @return the list of commits - get.commits.raw = function() { - return(self$get.commits()) - }, - #' Set the commit list of the project to a new one. - #' Add PaStA data if configured in the field \code{project.conf}. + #' Add PaStA and sychronicity data if configured in the \code{project.conf}. #' - #' @param data the new list of commits - set.commits = function(data) { - logging::loginfo("Setting raw commit data.") - if (is.null(data)) { - data = data.frame() + #' @param commit.data the new list of commits + set.commits = function(commit.data) { + logging::loginfo("Setting commit data.") + + # TODO: Also check for correct shape (column names and data types) of the passed data + + if (is.null(commit.data)) { + commit.data = create.empty.commits.list(); } + + ## append synchronicity data if wanted + if (private$project.conf$get.value("synchronicity")) { + synchronicity.data = self$get.synchronicity() + commit.data = merge(commit.data, synchronicity.data, + by = "hash", all.x = TRUE, sort = FALSE) + } + ## add PaStA data if wanted if (private$project.conf$get.value("pasta")) { - logging::loginfo("Adding PaStA data.") - data = private$add.pasta.data(data = data) + self$get.pasta() + commit.data = private$add.pasta.data(commit.data) } - private$commits = data + private$commits = commit.data - ## remove cached data for filtered commits as these need to be re-computed - ## after changing the data + ## remove cached data for filtered commits as these need to be re-computed after changing the data private$commits.filtered = NULL - private$commits.filtered.empty = NULL - }, - - #' Set the commit list of the project to a new one. - #' - #' Note: This is just a delegate for \code{ProjectData$set.commits(data)}. - #' - #' @param data the new list of commits - set.commits.raw = function(data) { - self$set.commits(data) }, #' Get the synchronicity data. @@ -1007,13 +944,36 @@ ProjectData = R6::R6Class("ProjectData", ## check given data source data.source = match.arg.or.default(data.source, several.ok = FALSE) - ## TODO use filtered commit data here (and not the filtered.empty version)? → try filtered! data.source.func = DATASOURCE.TO.ARTIFACT.FUNCTION[[data.source]] ## get the key-value mapping/list for the given parameters mylist = get.key.to.value.from.df(self[[data.source.func]](), group.column, data.column) return(mylist) + }, + + #' Get the list of authors by only looking only at the specified data source. + #' + #' *Note*: The constant \code{DATASOURCE.TO.ARTIFACT.FUNCTION} denotes the mapping between + #' data source and the method which is retrieving the data for each data source. + #' + #' @param data.source the data source which can be either \code{"commits"}, \code{"mails"}, + #' or \code{"issues"} + #' + #' @return a data.frame of unique author names (columns \code{name} and \code{author.email}), + #' extracted from the specified data source + get.authors.by.data.source = function(data.source = c("commits", "mails", "issues")) { + + data.source = match.arg(data.source) + + ## retrieve author names from chosen data source + data.source.func = DATASOURCE.TO.ARTIFACT.FUNCTION[[data.source]] + data = self[[data.source.func]]()[c("author.name", "author.email")] + + ## remove duplicates + data = unique(data) + + return (data) } ) ) diff --git a/util-misc.R b/util-misc.R index b459ead6..c230e080 100644 --- a/util-misc.R +++ b/util-misc.R @@ -16,6 +16,7 @@ ## Copyright 2017 by Christian Hechtl ## Copyright 2017 by Felix Prasse ## Copyright 2017-2018 by Thomas Bock +## Copyright 2018 by Jakob Kronawitter ## All Rights Reserved. @@ -133,6 +134,68 @@ match.arg.or.default = function(arg, choices, default = NULL, several.ok = FALSE } +## / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / +## Empty dataframe creation------------------------------------------------- + +#' Create an empty dataframe with the specified columns. Unless all columns should have the default datatype +#' \code{logical}, the second parameter \code{data.types} should specify the datatypes. +#' +#' @param columns a character vector containing all the column names +#' @param data.types a character vector of the same length as \code{columns}, the datatypes can be \code{integer}, +#' \code{numeric}, \code{POSIXct}, \code{character}, \code{factor} or \code{logical} +#' +#' @return the newly created empty dataframe +create.empty.data.frame = function(columns, data.types = NULL) { + + ## if the vector data.types is specified, its length must match the length of the corresponding column names + if (!is.null(data.types) && length(data.types) != length(columns)) { + stop("If specified, the length of the two given vectors columns and data.types must be the same.") + } + + ## create the empty data frame (with zero rows), but the given number of columns + data.frame = data.frame(matrix(nrow = 0, ncol = length(columns))) + colnames(data.frame) = columns + + ## assign the datatypes to the data frame columns by indivdually swapping the columns with new columns that possess + ## the correct data type + for (i in seq_along(data.types)) { + + ## get the column + column = data.frame[[i]] + + ## replace column with column of correct type + switch(tolower(data.types[i]), + "posixct" = { + column = as.POSIXct(column) + }, + "integer" = { + column = as.integer(column) + }, + "numeric" = { + column = as.numeric(column) + }, + "logical" = { + column = as.logical(column) + }, + "character" = { + column = as.character(column) + }, + "factor" = { + column = as.factor(column) + }, + { + stop(paste("Unknown datatype specified:", data.types[[i]])) + } + ) + + ## set the column back into the dataframe + data.frame[[i]] = column + } + + return(data.frame) +} + + ## / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / ## Stacktrace -------------------------------------------------------------- diff --git a/util-networks-covariates.R b/util-networks-covariates.R index edd196b1..92839f2e 100644 --- a/util-networks-covariates.R +++ b/util-networks-covariates.R @@ -15,6 +15,7 @@ ## Copyright 2018 by Claus Hunsen ## Copyright 2018 by Thomas Bock ## Copyright 2018 by Klara Schlüter +## Copyright 2018 by Jakob Kronawitter ## All Rights Reserved. ## / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / @@ -672,7 +673,7 @@ add.vertex.attribute.artifact.change.count = function(list.of.networks, project. nets.with.attr = split.and.add.vertex.attribute( list.of.networks, project.data, name, aggregation.level, default.value, function(range, range.data, net) { - artifact.to.commit = get.key.to.value.from.df(range.data$get.commits.filtered.empty(), "artifact", "hash") + artifact.to.commit = get.key.to.value.from.df(range.data$get.commits.filtered(), "artifact", "hash") artifact.change.count = lapply(artifact.to.commit, function(x) { length(unique(x[["hash"]])) }) @@ -709,7 +710,7 @@ add.vertex.attribute.artifact.first.occurrence = function(list.of.networks, proj nets.with.attr = split.and.add.vertex.attribute( list.of.networks, project.data, name, aggregation.level, default.value, function(range, range.data, net) { - artifact.to.dates = get.key.to.value.from.df(range.data$get.commits.filtered.empty(), "artifact", "date") + artifact.to.dates = get.key.to.value.from.df(range.data$get.commits.filtered(), "artifact", "date") artifact.to.first = lapply(artifact.to.dates, function(a) { min(a[["date"]]) }) diff --git a/util-networks.R b/util-networks.R index 4148d466..aefa8e02 100644 --- a/util-networks.R +++ b/util-networks.R @@ -16,6 +16,7 @@ ## Copyright 2017-2018 by Christian Hechtl ## Copyright 2017-2018 by Thomas Bock ## Copyright 2018 by Barbara Eckl +## Copyright 2018 by Jakob Kronawitter ## All Rights Reserved. @@ -153,14 +154,40 @@ NetworkBuilder = R6::R6Class("NetworkBuilder", return(private$authors.network.cochange) } + ## Get a list of all artifacts extracted from the commit data. Each artifact in this group is again a list + ## of all authors that were involved in making changes to this artifact. In the following two steps, some of + ## the artifacts are filtered from this list, which removes all information (including author information) + ## about these artifacts. Since we only want to lose the edge information and not the information about + ## authors, they will explicitly be added in a later step. + author.groups = private$proj.data$group.authors.by.data.column("commits", "artifact") + ## 1) if configured in the 'NetworkConf, remove the base artifact + if (!private$network.conf$get.value("edges.for.base.artifacts")) { + author.groups = author.groups[!(names(author.groups) %in% BASE.ARTIFACTS)] + } + ## 2) in any case, remove the untracked files + author.groups = author.groups[names(author.groups) != UNTRACKED.FILE.EMPTY.ARTIFACT] + ## construct edge list based on artifact2author data author.net.data = construct.edge.list.from.key.value.list( - private$proj.data$group.authors.by.data.column("commits", "artifact"), + author.groups, network.conf = private$network.conf, directed = private$network.conf$get.value("author.directed"), respect.temporal.order = private$network.conf$get.value("author.respect.temporal.order") ) + ## Add author vertices back into the graph. Previously, commit information on untracked files + ## ('UNTRACKED.FILE') and, if configured, the base artifact ('BASE.ARTIFACTS') has been removed and, hence, + ## also corresponding author information. Re-add author vertices back to the network now by accessing the + ## complete author list: + ## 1) get all authors on commits + authors = private$proj.data$get.authors.by.data.source(data.source = "commits") + ## 2) only select author names + authors = authors["author.name"] + ## 3) rename single column to "name" to correct mapping to vertex attribute "name" + colnames(authors) = "name" + ## 4) set author list as vertices + author.net.data[["vertices"]] = authors + ## construct network from obtained data author.net = construct.network.from.edge.list( author.net.data[["vertices"]], @@ -1035,17 +1062,15 @@ construct.network.from.edge.list = function(vertices, edge.list, network.conf, d return(create.empty.network(directed = directed)) } - ## if we have nodes to create, but no edges + ## if we have nodes to create, but no edges, create an empty edge list if (is.null(edge.list) || nrow(edge.list) == 0) { - ## create network with only the vertices - net = igraph::graph.empty(n = 0, directed = directed) + igraph::vertices(nodes.processed) - } - ## if we have nodes and edges - else { - ## construct network from edge list - net = igraph::graph.data.frame(edge.list, directed = directed, vertices = nodes.processed) + edge.list = create.empty.edge.list() } + ## construct network from edge list + net = igraph::graph.data.frame(edge.list, directed = directed, vertices = nodes.processed) + + ## initialize edge weights net = igraph::set.edge.attribute(net, "weight", value = 1) ## transform multiple edges to edge weights @@ -1082,7 +1107,7 @@ merge.network.data = function(vertex.data, edge.data) { edges = plyr::rbind.fill(edge.data.filtered) ## 3) correct empty results if (is.null(edges)) { - edges = data.frame(from = character(0), to = character(0)) + edges = create.empty.edge.list() } logging::logdebug("merge.network.data: finished.") @@ -1384,7 +1409,7 @@ get.sample.network = function() { ## project configuration proj.conf = ProjectConf$new(SAMPLE.DATA, "testing", "sample", "feature") - proj.conf$update.values(list(artifact.filter.base = FALSE)) + proj.conf$update.values(list(commits.filter.base.artifact = FALSE)) ## RangeData object range = proj.conf$get.value("ranges")[1] diff --git a/util-read.R b/util-read.R index f9ba84dc..ea8520a9 100644 --- a/util-read.R +++ b/util-read.R @@ -16,6 +16,7 @@ ## Copyright 2017-2018 by Christian Hechtl ## Copyright 2017 by Felix Prasse ## Copyright 2017-2018 by Thomas Bock +## Copyright 2018 by Jakob Kronawitter ## All Rights Reserved. @@ -29,6 +30,107 @@ requireNamespace("digest") # for sha1 hashing of IDs requireNamespace("sqldf") # for SQL-selections on data.frames +## / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / +## Constants --------------------------------------------------------------- + +## The following definition of column names for each individual data source corresponds to the individual extraction +## process of the tool 'codeface-extraction' (https://github.com/se-passau/codeface-extraction; use commit 0700f94 or +## compatible later commit). + +## column names of a dataframe containing authors (see file 'authors.list' and function \code{read.authors}) +AUTHORS.LIST.COLUMNS = c( + "author.id", "author.name", "author.email" +) + +## declare the datatype for each column in the constant 'AUTHORS.LIST.COLUMNS' +AUTHORS.LIST.DATA.TYPES = c( + "character", "character", "character" +) + +## column names of a dataframe containing commits (see file 'commits.list' and function \code{read.commits}) +COMMITS.LIST.COLUMNS = c( + "commit.id", # id + "date", "author.name", "author.email", # author information + "committer.date", "committer.name", "committer.email", # committer information + "hash", "changed.files", "added.lines", "deleted.lines", "diff.size", # commit information + "file", "artifact", "artifact.type", "artifact.diff.size" ## commit-dependency information +) + +## declare the datatype for each column in the constant 'COMMITS.LIST.COLUMNS' +COMMITS.LIST.DATA.TYPES = c( + "character", + "POSIXct", "character", "character", + "POSIXct", "character", "character", + "character", "numeric", "numeric", "numeric", "numeric", + "character", "character", "character", "numeric" +) + +## column names of a dataframe containing issues (see file 'issues.list' and function \code{read.issues}) +ISSUES.LIST.COLUMNS = c( + "issue.id", "issue.state", "creation.date", "closing.date", "is.pull.request", # issue information + "author.name", "author.email", # author information + "date", # the date + "ref.name", "event.name" # the event describing the row's entry +) + +## declare the datatype for each column in the constant 'ISSUES.LIST.COLUMNS' +ISSUES.LIST.DATA.TYPES = c( + "character", "character", "POSIXct", "POSIXct", "logical", + "character", "character", + "POSIXct", + "character", "character" +) + +## column names of a dataframe containing mails (see file 'mails.list' and function \code{read.mails}) +MAILS.LIST.COLUMNS = c( + "author.name", "author.email", # author information + "message.id", "date", "date.offset", "subject", # meta information + "thread" # thread ID +) + +## declare the datatype for each column in the constant 'MAILS.LIST.COLUMNS' +MAILS.LIST.DATA.TYPES = c( + "character", "character", + "character", "POSIXct", "numeric", "character", + "numeric" +) + +## / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / +## Empty dataframe creation------------------------------------------------- + +#' Create an empty dataframe which has the same shape as a dataframe containing authors. The dataframe has the column +#' names and column datatypes defined in \code{AUTHORS.LIST.COLUMNS} and \code{AUTHORS.LIST.DATA.TYPES}, respectively. +#' +#' @return the empty dataframe +create.empty.authors.list = function() { + return (create.empty.data.frame(AUTHORS.LIST.COLUMNS, AUTHORS.LIST.DATA.TYPES)) +} + +#' Create an empty dataframe which has the same shape as a dataframe containing commits. The dataframe has the column +#' names and column datatypes defined in \code{COMMITS.LIST.COLUMNS} and \code{COMMITS.LIST.DATA.TYPES}, respectively. +#' +#' @return the empty dataframe +create.empty.commits.list = function() { + return (create.empty.data.frame(COMMITS.LIST.COLUMNS, COMMITS.LIST.DATA.TYPES)) +} + +#' Create an empty dataframe which has the same shape as a dataframe containing issues. The dataframe has the column +#' names and column datatypes defined in \code{ISSUES.LIST.COLUMNS} and \code{ISSUES.LIST.DATA.TYPES}, respectively. +#' +#' @return the empty dataframe +create.empty.issues.list = function() { + return (create.empty.data.frame(ISSUES.LIST.COLUMNS, ISSUES.LIST.DATA.TYPES)) +} + +#' Create an empty dataframe which has the same shape as a dataframe containing mails. The dataframe has the column +#' names and column datatypes defined in \code{MAILS.LIST.COLUMNS} and \code{MAILS.LIST.DATA.TYPES}, respectively. +#' +#' @return the empty dataframe +create.empty.mails.list = function() { + return (create.empty.data.frame(MAILS.LIST.COLUMNS, MAILS.LIST.DATA.TYPES)) +} + + ## / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / ## Commit data ------------------------------------------------------------- @@ -51,30 +153,20 @@ read.commits = function(data.path, artifact) { if (inherits(commit.data, "try-error")) { logging::logwarn("There are no commits available for the current environment.") logging::logwarn("Datapath: %s", data.path) - return(data.frame()) + + # return a dataframe with the correct columns but zero rows + return(create.empty.commits.list()) } - ## set proper column names based on Codeface extraction: - ## - ## SELECT c.id, c.authorDate, a.name, a.email1, c.commitDate, - ## acom.name, acom.email1, c.commitHash, - ## c.ChangedFiles, c.AddedLines, c.DeletedLines, c.DiffSize, - ## cd.file, cd.entityId, cd.entityType, cd.size - commit.data.columns = c( - "commit.id", # id - "date", "author.name", "author.email", # author information - "committer.date", "committer.name", "committer.email", # committer information - "hash", "changed.files", "added.lines", "deleted.lines", "diff.size", # commit information - "file", "artifact", "artifact.type", "artifact.diff.size" ## commit-dependency information - ) - colnames(commit.data) = commit.data.columns + ## assign prepared column names to the dataframe + colnames(commit.data) = COMMITS.LIST.COLUMNS ## remove duplicated lines (even if they contain different commit ids but the same commit hash) commit.data = commit.data[rownames(unique(commit.data[, -1])), ] ## aggregate lines which are identical except for the "artifact.diff.size" column (ignoring the commit id) ## 1) select columns which have to be identical - primary.columns = commit.data.columns[!(commit.data.columns %in% c("commit.id", "artifact.diff.size"))] + primary.columns = COMMITS.LIST.COLUMNS[!(COMMITS.LIST.COLUMNS %in% c("commit.id", "artifact.diff.size"))] ## 2) aggregate "artifact.diff.size" for identical rows of the selected columns commit.data.without.id = aggregate(commit.data["artifact.diff.size"], commit.data[primary.columns], @@ -86,7 +178,7 @@ read.commits = function(data.path, artifact) { ## 4) merge the data again to have both "commit.id" and "artifact.diff.size" in one data.frame again commit.data = merge(commit.data.without.id, commit.data.without.artifact.diff.size) ## 5) reorder the columns of the data.frame as their order might be changed during aggregating and merging - commit.data = commit.data[, commit.data.columns] + commit.data = commit.data[, COMMITS.LIST.COLUMNS] ## rewrite data.frame when we want file-based data ## (we have proximity-based data as foundation) @@ -121,6 +213,15 @@ read.commits = function(data.path, artifact) { commit.data["artifact"] = artifacts.new } + ## Commits to files that are not tracked by Codeface have the empty string in the file and artifact column. + ## To better indicate this, the 'artifact' and 'file' column value is changed to 'untracked.file'. + commit.data["file"] = ifelse(commit.data[["file"]] == "", UNTRACKED.FILE, commit.data[["file"]]) + + ## copy the file column if file level analysis is performed + if (artifact == "file") { + commit.data["artifact"] = commit.data[["file"]] + } + ## convert dates and sort by them commit.data[["date"]] = get.date.from.string(commit.data[["date"]]) commit.data[["committer.date"]] = get.date.from.string(commit.data[["committer.date"]]) @@ -135,18 +236,6 @@ read.commits = function(data.path, artifact) { return(commit.data) } -#' Read the commits from the 'commits.list' file. -#' -#' @param data.path the path to the commit list -#' @param artifact the artifact whose commits are read -#' -#' Note: This is just a delegate for \code{read.commits(data.path, artifact)}. -#' -#' @return the read commits -read.commits.raw = function(data.path, artifact) { - return(read.commits(data.path = data.path, artifact = artifact)) -} - ## / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / ## Synchronicity data ------------------------------------------------------ @@ -214,17 +303,11 @@ read.mails = function(data.path) { if (inherits(mail.data, "try-error")) { logging::logwarn("There are no mails available for the current environment.") logging::logwarn("Datapath: %s", data.path) - return(data.frame()) + return(create.empty.mails.list()) } - ## set proper column names based on Codeface extraction: - ## - ## SELECT a.name AS authorName, a.messageId, a.email1, m.creationDate, m.subject, m.threadId - colnames(mail.data) = c( - "author.name", "author.email", # author information - "message.id", "date", "date.offset", "subject", # meta information - "thread" # thread ID - ) + + colnames(mail.data) = MAILS.LIST.COLUMNS ## set pattern for thread ID for better recognition mail.data[["thread"]] = sprintf("", mail.data[["thread"]]) @@ -284,15 +367,11 @@ read.authors = function(data.path) { stop("Stopped due to missing authors.") } - ## set proper column names based on Codeface extraction: - ## - ## SELECT a.name AS authorName, a.email1, m.creationDate, m.subject, m.threadId - authors.df.columns = c("author.id", "author.name", "author.email") ## if there is no third column, we need to add e-mail-address dummy data (NAs) - if (ncol(authors.df) != length(authors.df.columns)) { + if (ncol(authors.df) != length(AUTHORS.LIST.COLUMNS)) { authors.df[3] = NA } - colnames(authors.df) = authors.df.columns + colnames(authors.df) = AUTHORS.LIST.COLUMNS ## store the ID--author mapping logging::logdebug("read.authors: finished.") @@ -389,16 +468,11 @@ read.issues = function(data.path) { if (inherits(issue.data, "try-error")) { logging::logwarn("There are no Github issue data available for the current environment.") logging::logwarn("Datapath: %s", data.path) - return(data.frame()) + return(create.empty.issues.list()) } ## set proper column names - colnames(issue.data) = c( - "issue.id", "issue.state", "creation.date", "closing.date", "is.pull.request", # issue information - "author.name", "author.email", # author information - "date", # the date - "ref.name", "event.name" # the event describing the row's entry - ) + colnames(issue.data) = ISSUES.LIST.COLUMNS ## set pattern for issue ID for better recognition issue.data[["issue.id"]] = sprintf("", issue.data[["issue.id"]])