Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change commit filtering and network building regarding the untracked files and base artifact #149

Merged
merged 31 commits into from
Jan 15, 2019
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
64a9486
Remove get.commits.raw function from util-data.R
Nov 29, 2018
894c9a5
Move artifact kind filtering functionality into the get.commits method
Dec 2, 2018
e74e15d
Adjust read.commits to return a valid data.frame instead of an empty one
Dec 4, 2018
11428d9
Restructure get.commits and get.commits.filtered(.empty) methods
Dec 6, 2018
c26e582
Delete set.commits.raw and read.commits.raw methods.
Dec 6, 2018
51617bb
Adjust two testcases to work with the new get.commits method
Dec 6, 2018
67a4fbe
Adapt test cases to new changes and improve empty dataframe creation
Dec 7, 2018
c60c2f6
Change edge generation behaviour for base and untracked files artifact
Dec 8, 2018
fada26d
Adjust copyright headers of modified files
Dec 10, 2018
43f185d
Update changelog
Dec 10, 2018
5ea65b9
Add global constant 'UNTRACKED.FILE' and adjust documentation
Dec 15, 2018
ec8c6dd
Update default behavior of 'Conf' objects
clhunsen Dec 14, 2018
0d7c222
Fix nodes for networks without edges
bockthom Dec 16, 2018
6580427
Improve edge creation concerning untracked files and the base artifact
Dec 16, 2018
dde0dd7
Leave artifact column empty if artifact == file or artifact == funtion
Dec 17, 2018
d11d0fb
Add 'UNTRACKED.FILE constant' back into the constant 'BASE.ARTIFACTS'
Dec 17, 2018
32a7162
Alter inline comments with wrong information
Dec 17, 2018
466d8eb
Change names of network and project configuration options
Dec 18, 2018
7e27a18
Further improve construction of edgeless networks
clhunsen Dec 17, 2018
dc8873e
Update changelog.
Dec 18, 2018
137d833
Fix setting authors in co-change-based author networks
clhunsen Dec 18, 2018
e709786
Update README
Dec 19, 2018
a5802b0
Update documentation and showcase.R
Dec 20, 2018
67dcf31
Rename variable 'list' to 'author.groups' and adjust documentation
Dec 20, 2018
5f0f529
Add additional utility functions for easier empty dataframe creation
Dec 20, 2018
6043e5c
Change null checking behaviour of two methods
Dec 20, 2018
418d1dc
Update README
Jan 7, 2019
523daef
Move empty dataframe creation utility functions into util-read.R
Jan 7, 2019
f8281c7
Adjust comments for the column names of commonly used dataframes
Jan 9, 2019
01217a8
Update changelog
Jan 9, 2019
ae58902
Adjust copyright headers
Jan 14, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 0 additions & 2 deletions showcase.R
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,6 @@ x = NetworkBuilder$new(project.data = x.data, network.conf = net.conf)
# x.data$get.synchronicity()
# x.data$group.artifacts.by.data.column("commits", "author.name")
# x.data$get.commits.filtered()
# x.data$get.commits.filtered.empty()
jkronaw marked this conversation as resolved.
Show resolved Hide resolved
# x.data$get.mails()
# x.data$get.authors()
# x.data$get.data.path()
Expand Down Expand Up @@ -126,7 +125,6 @@ y = NetworkBuilder$new(project.data = y.data, network.conf = net.conf)
# y.data$get.synchronicity()
# y.data$group.artifacts.by.data.column("commits", "author.name")
# y.data$get.commits.filtered()
# y.data$get.commits.filtered.empty()
jkronaw marked this conversation as resolved.
Show resolved Hide resolved
# y.data$get.mails()
# y.data$get.authors()
# y.data$get.data.path()
Expand Down
6 changes: 6 additions & 0 deletions util-conf.R
Original file line number Diff line number Diff line change
Expand Up @@ -338,6 +338,12 @@ ProjectConf = R6::R6Class("ProjectConf", inherit = Conf,
allowed = c(TRUE, FALSE),
allowed.number = 1
),
filter.untracked.files = list(
jkronaw marked this conversation as resolved.
Show resolved Hide resolved
default = TRUE,
type = "logical",
allowed = c(TRUE, FALSE),
allowed.number = 1
),
synchronicity = list(
default = FALSE,
type = "logical",
Expand Down
199 changes: 68 additions & 131 deletions util-data.R
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ BASE.ARTIFACTS = c(
## mapping of data source to artifact column
## (for commits: filter also empty, non-configured, and (potentially) base artifacts)
jkronaw marked this conversation as resolved.
Show resolved Hide resolved
DATASOURCE.TO.ARTIFACT.FUNCTION = list(
"commits" = "get.commits.filtered.empty",
"commits" = "get.commits.filtered",
"mails" = "get.mails",
"issues" = "get.issues"
)
Expand Down Expand Up @@ -70,7 +70,6 @@ ProjectData = R6::R6Class("ProjectData",

## commits and commit data
commits.filtered = NULL, # data.frame
commits.filtered.empty = NULL, #data.frame
commits = NULL, # data.frame
synchronicity = NULL, # data.frame
pasta = NULL, # data.frame
Expand All @@ -85,88 +84,31 @@ ProjectData = R6::R6Class("ProjectData",

## * * filtering commits -------------------------------------------

#' Filter commits with empty artifacts from the already filtered commit list and
#' save the new list to 'commits.filtered.empty'.
#' Filter commits retrieved by the \code{get.commits} method by removing untracked files and removing the base
jkronaw marked this conversation as resolved.
Show resolved Hide resolved
#' artifact (see parameters).
#'
#' @seealso \code{get.commits.filtered}
filter.commits.empty = function() {

logging::logdebug("filter.commits.empty: starting.")

## do not compute anything more than once
if (!is.null(private$commits.filtered.empty)) {
logging::logdebug("filter.commits.empty: finished. (already existing)")
return(private$commits.filtered.empty)
}

## get raw commit data
commit.data = self$get.commits.filtered()

## break if the list of commits is empty
if (nrow(commit.data) == 0) {
logging::logwarn("There are no commits available for the current environment.")
logging::logwarn("Class: %s", self$get.class.name())
# logging::logwarn("Configuration: %s", private$project.conf$get.conf.as.string())
private$commits.filtered.empty = data.frame()
return(private$commits.filtered.empty)
}

## only process commits with non-empty artifact
commit.data = subset(commit.data, artifact != "")

## store the commit data
private$commits.filtered.empty = commit.data
logging::logdebug("filter.commits.empty: finished.")
},

#' Filter the data from the commit list which does not belong to the artifact listed in the field
#' \code{project.conf}.
#' If configured in \code{project.conf}, filter the commits from the commit list that touch the base artifact.
#' Add synchronicity and PaStA data if configured in \code{project.conf}.
#' Finally, save the new list to the field \code{commits.filtered}.
filter.commits = function() {

#' @param remove.untracked.files configures if untracked files should be kept or removed
#' @param remove.base.artifact configures if the base artifact should be kept or removed
jkronaw marked this conversation as resolved.
Show resolved Hide resolved
#'
#' @return the commits retrieved by the \code{get.commits} method after all filters have been applied
filter.commits = function(remove.untracked.files, remove.base.artifact) {
logging::logdebug("filter.commits: starting.")

## do not compute anything more than once
if (!is.null(private$commits.filtered)) {
logging::logdebug("filter.commits: finished. (already existing)")
return(private$commits.filtered)
}

## get raw commit data
## get commit data
commit.data = self$get.commits()

## break if the list of commits is empty
if (nrow(commit.data) == 0) {
logging::logwarn("There are no commits available for the current environment.")
logging::logwarn("Class: %s", self$get.class.name())
# logging::logwarn("Configuration: %s", private$project.conf$get.conf.as.string())
private$commits.filtered = data.frame()
return(private$commits.filtered)
## filter out the untracked files
if (remove.untracked.files) {
commit.data = subset(commit.data, artifact != "")
}

## filter out the base artifacts (i.e., Base_Feature, File_Level)
jkronaw marked this conversation as resolved.
Show resolved Hide resolved
if (private$project.conf$get.value("artifact.filter.base")) {
if (remove.base.artifact) {
commit.data = subset(commit.data, !(artifact %in% BASE.ARTIFACTS))
}

## append synchronicity data if wanted
if (private$project.conf$get.value("synchronicity")) {
synchronicity.data = self$get.synchronicity()
commit.data = merge(commit.data, synchronicity.data,
by = "hash", all.x = TRUE, sort = FALSE)
}

## add PaStA data if wanted
if (private$project.conf$get.value("pasta")) {
self$get.pasta()
commit.data = private$add.pasta.data(commit.data)
}

## store the commit data
private$commits.filtered = commit.data
logging::logdebug("filter.commits: finished.")
return(commit.data)
},

## * * PaStA data --------------------------------------------------
Expand Down Expand Up @@ -283,7 +225,6 @@ ProjectData = R6::R6Class("ProjectData",
#' changed.
reset.environment = function() {
private$commits.filtered = NULL
private$commits.filtered.empty = NULL
private$commits = NULL
private$synchronicity = NULL
private$mails = NULL
Expand Down Expand Up @@ -380,94 +321,91 @@ ProjectData = R6::R6Class("ProjectData",

## * * raw data ----------------------------------------------------

#' Get the list of commits without empty artifacts and filtered by the artifact kind configured
#' in the field \code{project.conf}.
#' If configured in \code{project.conf}, get the list of commits without the base artifact.
#' In addition, if configured in \code{project.conf}, append the synchronicity data and PaStA data
#' to the filtered commit data.
#' If the list of filtered commits does not already exist, call the filter method.
#' Return the commits retrieved by the \code{get.commits} method by removing untracked files and removing the
#' base artifact (if configured in the \code{project.conf}, see parameters \code{filter.untracked.files} and
#' \code{artifact.filter.base}). This method uses caching.
jkronaw marked this conversation as resolved.
Show resolved Hide resolved
#'
#' @return the commit list without empty artifacts and containing only commit data related to the
#' configured artifact and, if configured, without the base artifact
get.commits.filtered.empty = function() {
logging::loginfo("Getting commit data filtered by artifact.base and artifact.empty.")

## if commits are not read already, do this
if (is.null(private$commits.filtered.empty)) {
private$filter.commits.empty()
}

return(private$commits.filtered.empty)
},

#' Get the list of commits returned by the get.commits method and apply additional filters on them.
#' If configured in \code{project.conf}, get the list of commits without the base artifact.
#' In addition, if configured in \code{project.conf}, append the synchronicity data and PaStA data
#' to the filtered commit data.
#' If the list of filtered commits does not already exist, call the filter method.
#' @param remove.untracked.files configures if untracked files should be kept or removed
#' @param remove.base.artifact configures if the base artifact should be kept or removed
#'
jkronaw marked this conversation as resolved.
Show resolved Hide resolved
#' @return the commits retrieved by the \code{get.commits} method after all filters have been applied
#'
#' @return the commit list returned by get.commits with configured filters applied and optionally added PaSta or
#' synchronicity data
#' @seealso get.commits.filtered.uncached
get.commits.filtered = function() {
logging::loginfo("Getting commit data filtered by artifact.base.")

## if commits are not read already, do this
if (is.null(private$commits.filtered)) {
private$filter.commits()
private$commits.filtered = private$filter.commits(
private$project.conf$get.value("filter.untracked.files"),
private$project.conf$get.value("artifact.filter.base")
)
}

return(private$commits.filtered)
},

#' Get the complete list of commits filtered by the artifact kind which was configured in the
#' \code{project.conf}.
#' If configured in the field \code{project.conf}, append the PaStA data to the commit data
#' by calling the setter function.
#' If the list of commits does not already exist, call the read method first.
#' Return the commits retrieved by the \code{get.commits} method by removing untracked files and removing the
jkronaw marked this conversation as resolved.
Show resolved Hide resolved
#' base artifact (see parameters). This method doesn't use caching. If you want to use caching, please use the
#' \code{get.commits.filtered} method instead.
#'
#' @param remove.untracked.files configures if untracked files should be kept or removed
#' @param remove.base.artifact configures if the base artifact should be kept or removed
#'
#' @return the commits retrieved by the \code{get.commits} method after all filters have been applied
#'
#' @seealso get.commits.filtered
get.commits.filtered.uncached = function(remove.untracked.files, remove.base.artifact) {
return (private$filter.commits(remove.untracked.files, remove.base.artifact))
},

#' Get the list of commits which have the artifact kind configured in the \code{project.conf}.
#' If the list of commits is not cached, call the read method first. #'
jkronaw marked this conversation as resolved.
Show resolved Hide resolved
jkronaw marked this conversation as resolved.
Show resolved Hide resolved
#' If configured in the field \code{project.conf}, add PaStA and synchronicity data.
jkronaw marked this conversation as resolved.
Show resolved Hide resolved
#'
#' @return the list of commits
get.commits = function() {
logging::loginfo("Getting commit data.")

## if commits are not read already, do this
if (is.null(private$commits)) {
commit.data = read.commits(
self$get.data.path(),
private$project.conf$get.value("artifact")
)
commit.data = read.commits(self$get.data.path(), private$project.conf$get.value("artifact"))

## only process commits with the artifact listed in the configuration or missing
commit.data = subset(commit.data, artifact.type %in%
c(private$project.conf$get.value("artifact.codeface"), ""))
c(private$project.conf$get.value("artifact.codeface"), ""))
jkronaw marked this conversation as resolved.
Show resolved Hide resolved

self$set.commits(data = commit.data)
## saves the commit.data to the commits cache field after PaStA and synchronicity data is added
jkronaw marked this conversation as resolved.
Show resolved Hide resolved
self$set.commits(commit.data)
}
private$extract.timestamps(source = "commits")

return(private$commits)
},

#' Set the commit list of the project to a new one.
#' Add PaStA data if configured in the field \code{project.conf}.
#' Add PaStA and sychronicity data if configured in the field \code{project.conf}.
jkronaw marked this conversation as resolved.
Show resolved Hide resolved
#'
#' @param data the new list of commits
set.commits = function(data) {
logging::loginfo("Setting raw commit data.")
if (is.null(data)) {
jkronaw marked this conversation as resolved.
Show resolved Hide resolved
data = data.frame()
}
## add PaStA data if wanted
if (private$project.conf$get.value("pasta")) {
logging::loginfo("Adding PaStA data.")
data = private$add.pasta.data(data = data)
#' @param commit.data the new list of commits
set.commits = function(commit.data) {
logging::loginfo("Setting commit data.")

if (!is.null(commit.data)) {

## append synchronicity data if wanted
if (private$project.conf$get.value("synchronicity")) {
synchronicity.data = self$get.synchronicity()
commit.data = merge(commit.data, synchronicity.data,
by = "hash", all.x = TRUE, sort = FALSE)
}

## add PaStA data if wanted
if (private$project.conf$get.value("pasta")) {
self$get.pasta()
commit.data = private$add.pasta.data(commit.data)
}
}

private$commits = data
private$commits = commit.data

## remove cached data for filtered commits as these need to be re-computed
## after changing the data
## remove cached data for filtered commits as these need to be re-computed after changing the data
private$commits.filtered = NULL
private$commits.filtered.empty = NULL
},

#' Set the commit list of the project to a new one.
Expand Down Expand Up @@ -998,7 +936,6 @@ ProjectData = R6::R6Class("ProjectData",

## check given data source
data.source = match.arg.or.default(data.source, several.ok = FALSE)
## TODO use filtered commit data here (and not the filtered.empty version)? → try filtered!
data.source.func = DATASOURCE.TO.ARTIFACT.FUNCTION[[data.source]]

## get the key-value mapping/list for the given parameters
Expand Down
4 changes: 2 additions & 2 deletions util-networks-covariates.R
Original file line number Diff line number Diff line change
Expand Up @@ -672,7 +672,7 @@ add.vertex.attribute.artifact.change.count = function(list.of.networks, project.
nets.with.attr = split.and.add.vertex.attribute(
list.of.networks, project.data, name, aggregation.level, default.value,
function(range, range.data, net) {
artifact.to.commit = get.key.to.value.from.df(range.data$get.commits.filtered.empty(), "artifact", "hash")
artifact.to.commit = get.key.to.value.from.df(range.data$get.commits.filtered(), "artifact", "hash")
artifact.change.count = lapply(artifact.to.commit, function(x) {
length(unique(x[["hash"]]))
})
Expand Down Expand Up @@ -709,7 +709,7 @@ add.vertex.attribute.artifact.first.occurrence = function(list.of.networks, proj
nets.with.attr = split.and.add.vertex.attribute(
list.of.networks, project.data, name, aggregation.level, default.value,
function(range, range.data, net) {
artifact.to.dates = get.key.to.value.from.df(range.data$get.commits.filtered.empty(), "artifact", "date")
artifact.to.dates = get.key.to.value.from.df(range.data$get.commits.filtered(), "artifact", "date")
artifact.to.first = lapply(artifact.to.dates, function(a) {
min(a[["date"]])
})
Expand Down
4 changes: 4 additions & 0 deletions util-read.R
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,10 @@ read.commits = function(data.path, artifact) {
commit.data["artifact"] = artifacts.new
}

## Commits to files that are not tracked by Codeface have the empty string in the file column
jkronaw marked this conversation as resolved.
Show resolved Hide resolved
## To better indicate this, the column value is changed to 'untracked.file'
commit.data["file"] = ifelse(commit.data[["file"]] == "", "untracked.file", commit.data[["file"]])
jkronaw marked this conversation as resolved.
Show resolved Hide resolved

## convert dates and sort by them
commit.data[["date"]] = get.date.from.string(commit.data[["date"]])
commit.data[["committer.date"]] = get.date.from.string(commit.data[["committer.date"]])
Expand Down