Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change commit filtering and network building regarding the untracked files and base artifact #149

Merged
merged 31 commits into from
Jan 15, 2019
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
64a9486
Remove get.commits.raw function from util-data.R
Nov 29, 2018
894c9a5
Move artifact kind filtering functionality into the get.commits method
Dec 2, 2018
e74e15d
Adjust read.commits to return a valid data.frame instead of an empty one
Dec 4, 2018
11428d9
Restructure get.commits and get.commits.filtered(.empty) methods
Dec 6, 2018
c26e582
Delete set.commits.raw and read.commits.raw methods.
Dec 6, 2018
51617bb
Adjust two testcases to work with the new get.commits method
Dec 6, 2018
67a4fbe
Adapt test cases to new changes and improve empty dataframe creation
Dec 7, 2018
c60c2f6
Change edge generation behaviour for base and untracked files artifact
Dec 8, 2018
fada26d
Adjust copyright headers of modified files
Dec 10, 2018
43f185d
Update changelog
Dec 10, 2018
5ea65b9
Add global constant 'UNTRACKED.FILE' and adjust documentation
Dec 15, 2018
ec8c6dd
Update default behavior of 'Conf' objects
clhunsen Dec 14, 2018
0d7c222
Fix nodes for networks without edges
bockthom Dec 16, 2018
6580427
Improve edge creation concerning untracked files and the base artifact
Dec 16, 2018
dde0dd7
Leave artifact column empty if artifact == file or artifact == funtion
Dec 17, 2018
d11d0fb
Add 'UNTRACKED.FILE constant' back into the constant 'BASE.ARTIFACTS'
Dec 17, 2018
32a7162
Alter inline comments with wrong information
Dec 17, 2018
466d8eb
Change names of network and project configuration options
Dec 18, 2018
7e27a18
Further improve construction of edgeless networks
clhunsen Dec 17, 2018
dc8873e
Update changelog.
Dec 18, 2018
137d833
Fix setting authors in co-change-based author networks
clhunsen Dec 18, 2018
e709786
Update README
Dec 19, 2018
a5802b0
Update documentation and showcase.R
Dec 20, 2018
67dcf31
Rename variable 'list' to 'author.groups' and adjust documentation
Dec 20, 2018
5f0f529
Add additional utility functions for easier empty dataframe creation
Dec 20, 2018
6043e5c
Change null checking behaviour of two methods
Dec 20, 2018
418d1dc
Update README
Jan 7, 2019
523daef
Move empty dataframe creation utility functions into util-read.R
Jan 7, 2019
f8281c7
Adjust comments for the column names of commonly used dataframes
Jan 9, 2019
01217a8
Update changelog
Jan 9, 2019
ae58902
Adjust copyright headers
Jan 14, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,29 @@

## Unversioned

### Added
- In addition to the ProjectConf parameter `artifact.filter.base`, which configured whether the base artifact should be
included in the `get.commits.filtered` method, there is now the similiar parameter `filter.untracked.files` which does
the same thing for untracked files (11428d9847fd44f982cd094a3248bd13fb6b7b58)
- Edges are not being constructed in the author network between authors that only modify untracked files. For authors
it can be configured if the edges should be created or not using the new NetworkConf parameter `base.artifact.edges`
(c60c2f6e44b6f34cccb2714eccc7674158c83dde)
jkronaw marked this conversation as resolved.
Show resolved Hide resolved
- The public `get.commits.filtered.uncached` method was added which allows for external filtering of the commits by
specifying if untracked files and/or the base artifact should be filtered (this method does not take advantage of
caching, whereas the `get.commits.filtered` method does) (11428d9847fd44f982cd094a3248bd13fb6b7b58)

### Changed/Improved
- Change shape of `Vertices` in the legend of plots to avoid confusion (f4fb4807cfd87d9d552a9ede92ea65ae4a386a04)
- Commits that do not change any artifact are considered to be carried out on a metafile called `untracked.file`
(11428d9847fd44f982cd094a3248bd13fb6b7b58)
- Remove `get.commits.raw`, `set.commits.raw` and `read.commits.raw` functions (64a94863c9e70ac8c75e443bc15cd7facbf2111d,
c26e582e4ad6bf1eaeb08202fc3e00394332a013)
- Removed `get.commits.filtered.empty` and corresponding `filter.commits.empty` method, the functionality has been moved
to the altered `get.commits.filtered` and `filter.commits` method respectively (11428d9847fd44f982cd094a3248bd13fb6b7b58)
- The `filter.commits` method now takes parameters which configure if untracked files and/or the base artifact should be
filtered out (11428d9847fd44f982cd094a3248bd13fb6b7b58)
- Filtering by artifact kind (e.g. filtering out either Feature or FeatureExpression) is now being done in the
`get.commits` method instead of the `get.commits.filtered` method (894c9a5c181fef14dcb71fa23699bebbcbcd2b4f)

### Fixed
- Fix error when resetting an `ProjectData` environment (c64cab84e928a2a4c89a6df12440ba7ca06e6263)
Expand Down
4 changes: 2 additions & 2 deletions showcase.R
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
## Copyright 2017 by Christian Hechtl <[email protected]>
## Copyright 2017 by Felix Prasse <[email protected]>
## Copyright 2017-2018 by Thomas Bock <[email protected]>
## Copyright 2018 by Jakob Kronawitter <[email protected]>
## All Rights Reserved.


Expand Down Expand Up @@ -85,7 +86,7 @@ x = NetworkBuilder$new(project.data = x.data, network.conf = net.conf)
# x.data$get.synchronicity()
# x.data$group.artifacts.by.data.column("commits", "author.name")
# x.data$get.commits.filtered()
# x.data$get.commits.filtered.empty()
jkronaw marked this conversation as resolved.
Show resolved Hide resolved
# x.data$get.commits.filtered.uncached(remove.untracked.files = TRUE, remove.base.artifact = FALSE)
# x.data$get.mails()
# x.data$get.authors()
# x.data$get.data.path()
Expand Down Expand Up @@ -126,7 +127,6 @@ y = NetworkBuilder$new(project.data = y.data, network.conf = net.conf)
# y.data$get.synchronicity()
# y.data$group.artifacts.by.data.column("commits", "author.name")
# y.data$get.commits.filtered()
# y.data$get.commits.filtered.empty()
jkronaw marked this conversation as resolved.
Show resolved Hide resolved
# y.data$get.mails()
# y.data$get.authors()
# y.data$get.data.path()
Expand Down
37 changes: 17 additions & 20 deletions tests/test-data-cut.R
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
## Copyright 2018 by Claus Hunsen <[email protected]>
## Copyright 2018 by Barbara Eckl <[email protected]>
## Copyright 2018 by Thomas Bock <[email protected]>
## Copyright 2018 by Jakob Kronawitter <[email protected]>
## All Rights Reserved.


Expand Down Expand Up @@ -44,26 +45,22 @@ test_that("Cut commit and mail data to same date range.", {

x.data = ProjectData$new(proj.conf)

commit.data.expected = data.frame(commit.id = sprintf("<commit-%s>", c(32712, 32712, 32713, 32713)),
date = get.date.from.string(c("2016-07-12 15:58:59", "2016-07-12 15:58:59", "2016-07-12 16:00:45",
"2016-07-12 16:00:45")),
author.name = c("Björn", "Björn", "Olaf", "Olaf"),
author.email = c("[email protected]", "[email protected]", "[email protected]",
"[email protected]"),
committer.date = get.date.from.string(c("2016-07-12 15:58:59", "2016-07-12 15:58:59", "2016-07-20 10:00:44",
"2016-07-20 10:00:44")),
committer.name = c("Björn", "Björn", "Björn", "Björn"),
committer.email = c("[email protected]", "[email protected]", "[email protected]", "[email protected]"),
hash = c("72c8dd25d3dd6d18f46e2b26a5f5b1e2e8dc28d0", "72c8dd25d3dd6d18f46e2b26a5f5b1e2e8dc28d0",
"5a5ec9675e98187e1e92561e1888aa6f04faa338", "5a5ec9675e98187e1e92561e1888aa6f04faa338"),
changed.files = as.integer(c(1, 1, 1, 1)),
added.lines = as.integer(c(1, 1, 1, 1)),
deleted.lines = as.integer(c(1, 1, 0, 0)),
diff.size = as.integer(c(2, 2, 1, 1)),
file = c("test.c", "test.c", "test.c", "test.c"),
artifact = c("A", "defined(A)", "A", "defined(A)"),
artifact.type = c("Feature", "FeatureExpression", "Feature", "FeatureExpression"),
artifact.diff.size = as.integer(c(1, 1, 1, 1)))
commit.data.expected = data.frame(commit.id = sprintf("<commit-%s>", c(32712, 32713)),
date = get.date.from.string(c("2016-07-12 15:58:59", "2016-07-12 16:00:45")),
author.name = c("Björn", "Olaf"),
author.email = c("[email protected]", "[email protected]"),
committer.date = get.date.from.string(c("2016-07-12 15:58:59", "2016-07-20 10:00:44")),
committer.name = c("Björn", "Björn"),
committer.email = c("[email protected]", "[email protected]"),
hash = c("72c8dd25d3dd6d18f46e2b26a5f5b1e2e8dc28d0", "5a5ec9675e98187e1e92561e1888aa6f04faa338"),
changed.files = as.integer(c(1, 1)),
added.lines = as.integer(c(1, 1)),
deleted.lines = as.integer(c(1, 0)),
diff.size = as.integer(c(2, 1)),
file = c("test.c", "test.c"),
artifact = c("A", "A"),
artifact.type = c("Feature", "Feature"),
artifact.diff.size = as.integer(c(1, 1)))

mail.data.expected = data.frame(author.name = c("Thomas"),
author.email = c("[email protected]"),
Expand Down
37 changes: 17 additions & 20 deletions tests/test-networks-cut.R
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
## Copyright 2017 by Christian Hechtl <[email protected]>
## Copyright 2018 by Claus Hunsen <[email protected]>
## Copyright 2018 by Thomas Bock <[email protected]>
## Copyright 2018 by Jakob Kronawitter <[email protected]>
## All Rights Reserved.


Expand Down Expand Up @@ -44,26 +45,22 @@ test_that("Cut commit and mail data to same date range.", {
x.data = ProjectData$new(proj.conf)
x = NetworkBuilder$new(x.data, net.conf)

commit.data.expected = data.frame(commit.id = sprintf("<commit-%s>", c(32712, 32712, 32713, 32713)),
date = get.date.from.string(c("2016-07-12 15:58:59", "2016-07-12 15:58:59", "2016-07-12 16:00:45",
"2016-07-12 16:00:45")),
author.name = c("Björn", "Björn", "Olaf", "Olaf"),
author.email = c("[email protected]", "[email protected]", "[email protected]",
"[email protected]"),
committer.date = get.date.from.string(c("2016-07-12 15:58:59", "2016-07-12 15:58:59", "2016-07-20 10:00:44",
"2016-07-20 10:00:44")),
committer.name = c("Björn", "Björn", "Björn", "Björn"),
committer.email = c("[email protected]", "[email protected]", "[email protected]", "[email protected]"),
hash = c("72c8dd25d3dd6d18f46e2b26a5f5b1e2e8dc28d0", "72c8dd25d3dd6d18f46e2b26a5f5b1e2e8dc28d0",
"5a5ec9675e98187e1e92561e1888aa6f04faa338", "5a5ec9675e98187e1e92561e1888aa6f04faa338"),
changed.files = as.integer(c(1, 1, 1, 1)),
added.lines = as.integer(c(1, 1, 1, 1)),
deleted.lines = as.integer(c(1, 1, 0, 0)),
diff.size = as.integer(c(2, 2, 1, 1)),
file = c("test.c", "test.c", "test.c", "test.c"),
artifact = c("A", "defined(A)", "A", "defined(A)"),
artifact.type = c("Feature", "FeatureExpression", "Feature", "FeatureExpression"),
artifact.diff.size = as.integer(c(1, 1, 1, 1)))
commit.data.expected = data.frame(commit.id = sprintf("<commit-%s>", c(32712, 32713)),
date = get.date.from.string(c("2016-07-12 15:58:59", "2016-07-12 16:00:45")),
author.name = c("Björn", "Olaf"),
author.email = c("[email protected]", "[email protected]"),
committer.date = get.date.from.string(c("2016-07-12 15:58:59", "2016-07-20 10:00:44")),
committer.name = c("Björn", "Björn"),
committer.email = c("[email protected]", "[email protected]"),
hash = c("72c8dd25d3dd6d18f46e2b26a5f5b1e2e8dc28d0", "5a5ec9675e98187e1e92561e1888aa6f04faa338"),
changed.files = as.integer(c(1, 1)),
added.lines = as.integer(c(1, 1)),
deleted.lines = as.integer(c(1, 0)),
diff.size = as.integer(c(2, 1)),
file = c("test.c", "test.c"),
artifact = c("A", "A"),
artifact.type = c("Feature", "Feature"),
artifact.diff.size = as.integer(c(1, 1)))

mail.data.expected = data.frame(author.name = c("Thomas"),
author.email = c("[email protected]"),
Expand Down
3 changes: 2 additions & 1 deletion tests/test-read.R
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
## Copyright 2017 by Felix Prasse <[email protected]>
## Copyright 2018 by Claus Hunsen <[email protected]>
## Copyright 2018 by Thomas Bock <[email protected]>
## Copyright 2018 by Jakob Kronawitter <[email protected]>
## All Rights Reserved.


Expand Down Expand Up @@ -88,7 +89,7 @@ test_that("Read the raw commit data with the file artifact.", {
proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, "file")

## read the actual data
commit.data.read = read.commits.raw(proj.conf$get.value("datapath"), proj.conf$get.value("artifact"))
commit.data.read = read.commits(proj.conf$get.value("datapath"), proj.conf$get.value("artifact"))

## build the expected data.frame
commit.data.expected = data.frame(commit.id = sprintf("<commit-%s>", c(32716, 32717, 32718, 32719, 32715)),
Expand Down
57 changes: 29 additions & 28 deletions tests/test-split.R
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
## Copyright 2017 by Felix Prasse <[email protected]>
## Copyright 2018 by Thomas Bock <[email protected]>
## Copyright 2018 by Christian Hechtl <[email protected]>
## Copyright 2018 by Jakob Kronawitter <[email protected]>
## All Rights Reserved.


Expand Down Expand Up @@ -93,9 +94,9 @@ test_that("Split a data object time-based (split.basis == 'commits').", {
## check data for all ranges
expected.data = list(
commits = list(
"2016-07-12 15:58:59-2016-07-12 16:01:59" = data$commits[1:4, ],
"2016-07-12 16:01:59-2016-07-12 16:04:59" = data.frame(),
"2016-07-12 16:04:59-2016-07-12 16:06:33" = data$commits[5:9, ]
"2016-07-12 15:58:59-2016-07-12 16:01:59" = data$commits[1:2, ],
"2016-07-12 16:01:59-2016-07-12 16:04:59" = data$commits[0, ],
"2016-07-12 16:04:59-2016-07-12 16:06:33" = data$commits[3:6, ]
),
mails = list(
"2016-07-12 15:58:59-2016-07-12 16:01:59" = data.frame(),
Expand Down Expand Up @@ -168,10 +169,10 @@ test_that("Split a data object time-based (split.basis == 'mails').", {
## check data for all ranges
expected.data = list(
commits = list(
"2004-10-09 18:38:13-2007-10-10 12:38:13" = data.frame(),
"2007-10-10 12:38:13-2010-10-10 06:38:13" = data.frame(),
"2010-10-10 06:38:13-2013-10-10 00:38:13" = data.frame(),
"2013-10-10 00:38:13-2016-07-12 16:05:38" = data$commits[1:4, ]
"2004-10-09 18:38:13-2007-10-10 12:38:13" = data$commits[0, ],
"2007-10-10 12:38:13-2010-10-10 06:38:13" = data$commits[0, ],
"2010-10-10 06:38:13-2013-10-10 00:38:13" = data$commits[0, ],
"2013-10-10 00:38:13-2016-07-12 16:05:38" = data$commits[1:2, ]
),
mails = list(
"2004-10-09 18:38:13-2007-10-10 12:38:13" = data$mails[rownames(data$mails) %in% 1:2, ],
Expand Down Expand Up @@ -247,9 +248,9 @@ test_that("Split a data object time-based (split.basis == 'issues').", {
## check data for all ranges
expected.data = list(
commits = list(
"2013-04-21 23:52:09-2015-04-22 11:52:09" = data.frame(),
"2013-04-21 23:52:09-2015-04-22 11:52:09" = data$commits[0, ],
"2015-04-22 11:52:09-2017-04-21 23:52:09" = data$commits,
"2017-04-21 23:52:09-2017-05-23 12:32:40" = data.frame()
"2017-04-21 23:52:09-2017-05-23 12:32:40" = data$commits[0, ]
),
mails = list(
"2013-04-21 23:52:09-2015-04-22 11:52:09" = data.frame(),
Expand Down Expand Up @@ -496,9 +497,9 @@ test_that("Split a data object activity-based (activity.type = 'commits').", {
## check data for all ranges
expected.data = list(
commits = list(
"2016-07-12 15:58:59-2016-07-12 16:05:41" = data$commits[1:4, ],
"2016-07-12 16:05:41-2016-07-12 16:06:32" = data$commits[5:7, ],
"2016-07-12 16:06:32-2016-07-12 16:06:33" = data$commits[8:9, ]
"2016-07-12 15:58:59-2016-07-12 16:05:41" = data$commits[1:2, ],
"2016-07-12 16:05:41-2016-07-12 16:06:32" = data$commits[3:4, ],
"2016-07-12 16:06:32-2016-07-12 16:06:33" = data$commits[5:6, ]
),
mails = list(
"2016-07-12 15:58:59-2016-07-12 16:05:41" = data$mails[rownames(data$mails) %in% 16:17, ],
Expand Down Expand Up @@ -591,8 +592,8 @@ test_that("Split a data object activity-based (activity.type = 'commits').", {
## check data for all ranges
expected.data = list(
commits = list(
"2016-07-12 15:58:59-2016-07-12 16:06:10" = data$commits[1:6, ],
"2016-07-12 16:06:10-2016-07-12 16:06:33" = data$commits[7:9, ]
"2016-07-12 15:58:59-2016-07-12 16:06:10" = data$commits[1:3, ],
"2016-07-12 16:06:10-2016-07-12 16:06:33" = data$commits[4:6, ]
),
mails = list(
"2016-07-12 15:58:59-2016-07-12 16:06:10" = data$mails[rownames(data$mails) %in% 16:17, ],
Expand Down Expand Up @@ -675,12 +676,12 @@ test_that("Split a data object activity-based (activity.type = 'mails').", {
## check data for all ranges
expected.data = list(
commits = list(
"2004-10-09 18:38:13-2010-07-12 11:05:35" = data.frame(),
"2010-07-12 11:05:35-2010-07-12 12:05:41" = data.frame(),
"2010-07-12 12:05:41-2010-07-12 12:05:44" = data.frame(),
"2010-07-12 12:05:44-2016-07-12 15:58:40" = data.frame(),
"2016-07-12 15:58:40-2016-07-12 16:05:37" = data$commits[1:4, ],
"2016-07-12 16:05:37-2016-07-12 16:05:38" = data.frame()
"2004-10-09 18:38:13-2010-07-12 11:05:35" = data$commits[0, ],
"2010-07-12 11:05:35-2010-07-12 12:05:41" = data$commits[0, ],
"2010-07-12 12:05:41-2010-07-12 12:05:44" = data$commits[0, ],
"2010-07-12 12:05:44-2016-07-12 15:58:40" = data$commits[0, ],
"2016-07-12 15:58:40-2016-07-12 16:05:37" = data$commits[1:2, ],
"2016-07-12 16:05:37-2016-07-12 16:05:38" = data$commits[0, ]
),
mails = list(
"2004-10-09 18:38:13-2010-07-12 11:05:35" = data$mails[rownames(data$mails) %in% 1:3, ],
Expand Down Expand Up @@ -742,7 +743,7 @@ test_that("Split a data object activity-based (activity.type = 'mails').", {
## check data for all ranges
expected.data = list(
commits = list(
"2004-10-09 18:38:13-2016-07-12 16:05:38" = data$commits[1:4, ]
"2004-10-09 18:38:13-2016-07-12 16:05:38" = data$commits[1:2, ]
),
mails = list(
"2004-10-09 18:38:13-2016-07-12 16:05:38" = data$mails
Expand Down Expand Up @@ -785,8 +786,8 @@ test_that("Split a data object activity-based (activity.type = 'mails').", {
## check data for all ranges
expected.data = list(
commits = list(
"2004-10-09 18:38:13-2010-07-12 12:05:43" = data.frame(),
"2010-07-12 12:05:43-2016-07-12 16:05:38" = data$commits[1:4, ]
"2004-10-09 18:38:13-2010-07-12 12:05:43" = data$commits[0, ],
"2010-07-12 12:05:43-2016-07-12 16:05:38" = data$commits[1:2, ]
),
mails = list(
"2004-10-09 18:38:13-2010-07-12 12:05:43" = data$mails[rownames(data$mails) %in% 1:8, ],
Expand Down Expand Up @@ -866,10 +867,10 @@ test_that("Split a data object activity-based (activity.type = 'issues').", {
## check data for all ranges
expected.data = list(
commits = list(
"2013-04-21 23:52:09-2016-07-12 16:05:47" = data$commits[1:6, ],
"2016-07-12 16:05:47-2016-08-31 18:21:48" = data$commits[7:9, ],
"2016-08-31 18:21:48-2017-02-20 22:25:41" = data.frame(),
"2017-02-20 22:25:41-2017-05-23 12:32:40" = data.frame()
"2013-04-21 23:52:09-2016-07-12 16:05:47" = data$commits[1:3, ],
"2016-07-12 16:05:47-2016-08-31 18:21:48" = data$commits[4:6, ],
"2016-08-31 18:21:48-2017-02-20 22:25:41" = data$commits[0, ],
"2017-02-20 22:25:41-2017-05-23 12:32:40" = data$commits[0, ]
),
mails = list(
"2013-04-21 23:52:09-2016-07-12 16:05:47" = data$mails[rownames(data$mails) %in% 14:17, ],
Expand Down Expand Up @@ -967,7 +968,7 @@ test_that("Split a data object activity-based (activity.type = 'issues').", {
expected.data = list(
commits = list(
"2013-04-21 23:52:09-2016-07-27 22:25:25" = data$commits,
"2016-07-27 22:25:25-2017-05-23 12:32:40" = data.frame()
"2016-07-27 22:25:25-2017-05-23 12:32:40" = data$commits[0, ]
),
mails = list(
"2013-04-21 23:52:09-2016-07-27 22:25:25" = data$mails[rownames(data$mails) %in% 14:17, ],
Expand Down
Loading