Skip to content

Commit

Permalink
Fix setting authors in co-change-based author networks
Browse files Browse the repository at this point in the history
This patch consists of three related fix and adaptations:

First, the method 'ProjectData$get.authors.by.data.source' does not
correct the column names of the returned data.frame anymore. This
establishes compatibility with the method 'ProjectData$get.authors'.
Additionally, the returned data.frame only contains unique entries. The
documentation is tidied.

Second, the method 'NetworkBuilder$get.author.network.cochange' is fixed
by adding the missing 'private$' prefix when accessing the project data.

Third, the assignment of author vertices is corrected to use only author
names with the correct vertex attribute (i.e., "name"). This adapts the
code with respect to the first change mentioned above.

This change fixes all failing tests in PR se-sic#149.

Signed-off-by: Claus Hunsen <[email protected]>
  • Loading branch information
clhunsen authored and fehnkera committed Sep 23, 2020
1 parent 1d18f03 commit d964b0a
Show file tree
Hide file tree
Showing 2 changed files with 20 additions and 10 deletions.
18 changes: 11 additions & 7 deletions util-data.R
Original file line number Diff line number Diff line change
Expand Up @@ -948,23 +948,27 @@ ProjectData = R6::R6Class("ProjectData",
return(mylist)
},

#' Get the list of authors by only looking at the specified data source. The constant
#' \code{DATASOURCE.TO.ARTIFACT.FUNCTION} describes the mapping between data source and the method which is
#' retrieving the data for each data source.
#' Get the list of authors by only looking only at the specified data source.
#'
#' @param data.source the data source which can be either \code{"commits"}, \code{"mails"} or \code{"issues"}
#' *Note*: The constant \code{DATASOURCE.TO.ARTIFACT.FUNCTION} denotes the mapping between
#' data source and the method which is retrieving the data for each data source.
#'
#' @return the list of authors extracted from the specified data source
#' @param data.source the data source which can be either \code{"commits"}, \code{"mails"},
#' or \code{"issues"}
#'
#' @return a data.frame of unique author names (columns \code{name} and \code{author.email}),
#' extracted from the specified data source
get.authors.by.data.source = function(data.source = c("commits", "mails", "issues")) {
if (is.null(data.source)) {
stop ("Data source can't be null.")
}

data.source = match.arg(data.source)
data.source.func = DATASOURCE.TO.ARTIFACT.FUNCTION[[data.source]]

data = self[[data.source.func]]()[c("author.name", "author.email")]
names(data) = c("name", "email")

## remove duplicates
data = unique(data)

return (data)
}
Expand Down
12 changes: 9 additions & 3 deletions util-networks.R
Original file line number Diff line number Diff line change
Expand Up @@ -175,9 +175,15 @@ NetworkBuilder = R6::R6Class("NetworkBuilder",
## Add author vertices back into the graph. Previously, commit information on untracked files
## ('UNTRACKED.FILE') and, if configured, the base artifact ('BASE.ARTIFACTS') has been removed and, hence,
## also corresponding author information. Re-add author vertices back to the network now by accessing the
## complete author list.
authors = proj.data$get.authors.by.data.source(data.source = "commits")
author.net.data[["vertices"]] = authors["name"]
## complete author list:
## 1) get all authors on commits
authors = private$proj.data$get.authors.by.data.source(data.source = "commits")
## 2) only select author names
authors = authors["author.name"]
## 3) rename single column to "name" to correct mapping to vertex attribute "name"
colnames(authors) = "name"
## 4) set author list as vertices
author.net.data[["vertices"]] = authors

## construct network from obtained data
author.net = construct.network.from.edge.list(
Expand Down

0 comments on commit d964b0a

Please sign in to comment.