Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change commit filtering and network building regarding the untracked files and base artifact #149

Merged
merged 31 commits into from
Jan 15, 2019
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
64a9486
Remove get.commits.raw function from util-data.R
Nov 29, 2018
894c9a5
Move artifact kind filtering functionality into the get.commits method
Dec 2, 2018
e74e15d
Adjust read.commits to return a valid data.frame instead of an empty one
Dec 4, 2018
11428d9
Restructure get.commits and get.commits.filtered(.empty) methods
Dec 6, 2018
c26e582
Delete set.commits.raw and read.commits.raw methods.
Dec 6, 2018
51617bb
Adjust two testcases to work with the new get.commits method
Dec 6, 2018
67a4fbe
Adapt test cases to new changes and improve empty dataframe creation
Dec 7, 2018
c60c2f6
Change edge generation behaviour for base and untracked files artifact
Dec 8, 2018
fada26d
Adjust copyright headers of modified files
Dec 10, 2018
43f185d
Update changelog
Dec 10, 2018
5ea65b9
Add global constant 'UNTRACKED.FILE' and adjust documentation
Dec 15, 2018
ec8c6dd
Update default behavior of 'Conf' objects
clhunsen Dec 14, 2018
0d7c222
Fix nodes for networks without edges
bockthom Dec 16, 2018
6580427
Improve edge creation concerning untracked files and the base artifact
Dec 16, 2018
dde0dd7
Leave artifact column empty if artifact == file or artifact == funtion
Dec 17, 2018
d11d0fb
Add 'UNTRACKED.FILE constant' back into the constant 'BASE.ARTIFACTS'
Dec 17, 2018
32a7162
Alter inline comments with wrong information
Dec 17, 2018
466d8eb
Change names of network and project configuration options
Dec 18, 2018
7e27a18
Further improve construction of edgeless networks
clhunsen Dec 17, 2018
dc8873e
Update changelog.
Dec 18, 2018
137d833
Fix setting authors in co-change-based author networks
clhunsen Dec 18, 2018
e709786
Update README
Dec 19, 2018
a5802b0
Update documentation and showcase.R
Dec 20, 2018
67dcf31
Rename variable 'list' to 'author.groups' and adjust documentation
Dec 20, 2018
5f0f529
Add additional utility functions for easier empty dataframe creation
Dec 20, 2018
6043e5c
Change null checking behaviour of two methods
Dec 20, 2018
418d1dc
Update README
Jan 7, 2019
523daef
Move empty dataframe creation utility functions into util-read.R
Jan 7, 2019
f8281c7
Adjust comments for the column names of commonly used dataframes
Jan 9, 2019
01217a8
Update changelog
Jan 9, 2019
ae58902
Adjust copyright headers
Jan 14, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 22 additions & 2 deletions util-data.R
Original file line number Diff line number Diff line change
Expand Up @@ -38,8 +38,7 @@ UNTRACKED.FILE = "<untracked.file>"
## base artifacts
jkronaw marked this conversation as resolved.
Show resolved Hide resolved
BASE.ARTIFACTS = c(
"Base_Feature",
"File_Level",
UNTRACKED.FILE
jkronaw marked this conversation as resolved.
Show resolved Hide resolved
"File_Level"
)

## mapping of data source to artifact column
Expand Down Expand Up @@ -941,6 +940,27 @@ ProjectData = R6::R6Class("ProjectData",
mylist = get.key.to.value.from.df(self[[data.source.func]](), group.column, data.column)

return(mylist)
},

#' Get the list of authors by only looking at the specified data source. The constant
#' \code{DATASOURCE.TO.ARTIFACT.FUNCTION} describes the mapping between data source and the method which is
#' retrieving the data for each data source.
#'
#' @param data.source the data source which can be either \code{"commits"}, \code{"mails"} or \code{"issues"}
#'
#' @return the list of authors extracted from the specified data source
get.authors.by.data.source = function(data.source = c("commits", "mails", "issues")) {
if (is.null(data.source)) {
jkronaw marked this conversation as resolved.
Show resolved Hide resolved
stop ("Data source can't be null.")
}

data.source = match.arg(data.source)
data.source.func = DATASOURCE.TO.ARTIFACT.FUNCTION[[data.source]]

data = self[[data.source.func]]()[c("author.name", "author.email")]
names(data) = c("name", "email")

return (data)
}
)
)
Expand Down
35 changes: 10 additions & 25 deletions util-networks.R
Original file line number Diff line number Diff line change
Expand Up @@ -156,33 +156,13 @@ NetworkBuilder = R6::R6Class("NetworkBuilder",

list = private$proj.data$group.authors.by.data.column("commits", "artifact")
jkronaw marked this conversation as resolved.
Show resolved Hide resolved

# split untracked.files subgroup into multiple subgroups which only contain one author each to prohibit edge
# edge construction between authors of this subgroup
if (!is.null(list[["untracked.files"]])) {
for (i in 1:nrow(list[["untracked.files"]])) {
row = list[["untracked.files"]][i, ]
list[[paste0("untracked.files_", i)]] = row
}
list[["untracked.files"]] = NULL
## if configured in the network conf, remove base artifacts, so that no edges are created in the next step
if (!private$network.conf$get.value("base.artifact.edges")) {
list = list[!(names(list) %in% BASE.ARTIFACTS)]
}

# split base feature subgroup into multiple subgroups which only contain one author each to prohibit edge
# edge construction between authors of this subgroup
if (!is.null(list[["Base_Feature"]]) && !private$network.conf$get.value("base.artifact.edges")) {
for (i in 1:nrow(list[["Base_Feature"]])) {
row = list[["Base_Feature"]][i, ]
list[[paste0("Base_Feature_", i)]] = row
}
list[["Base_Feature"]] = NULL
}

if (!is.null(list[["File_Level"]]) && !private$network.conf$get.value("base.artifact.edges")) {
for (i in 1:nrow(list[["File_Level"]])) {
row = list[["File_Level"]][i, ]
list[[paste0("File_Level_", i)]] = row
}
list[["File_Level"]] = NULL
}
## remove untracked files, so that no edges are created in the next step
list = list[names(list) != UNTRACKED.FILE]

## construct edge list based on artifact2author data
author.net.data = construct.edge.list.from.key.value.list(
Expand All @@ -192,6 +172,11 @@ NetworkBuilder = R6::R6Class("NetworkBuilder",
respect.temporal.order = private$network.conf$get.value("author.respect.temporal.order")
)

## Add author vertices back into the graph. Previously the untracked file commiters and - if configured -
## the base artifact commiters have been removed to avoid edge creation among them.
jkronaw marked this conversation as resolved.
Show resolved Hide resolved
authors = proj.data$get.authors.by.data.source(data.source = "commits")
author.net.data[["vertices"]] = authors["name"]
jkronaw marked this conversation as resolved.
Show resolved Hide resolved

## construct network from obtained data
author.net = construct.network.from.edge.list(
author.net.data[["vertices"]],
Expand Down