Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change default representation of edge attributes from vectors to lists & other miscellaneous adjustments #274

Open
wants to merge 10 commits into
base: dev
Choose a base branch
from

Conversation

maxloeffler
Copy link
Contributor

@maxloeffler maxloeffler commented Dec 5, 2024

Prerequisites

  • I adhere to the coding conventions (described here) in my code.
  • I have updated the copyright headers of the files I have modified.
  • I have written appropriate commit messages, i.e., I have recorded the goal, the need, the needed changes, and the location of my code modifications for each commit. This includes also, e.g., referencing to relevant issues.
  • I have put signed-off tags in all commits.
  • I have updated the changelog file NEWS.md appropriately.
  • I have checked whether I need to adjust the showcase file showcase.R with respect to my changes.
  • The pull request is opened against the branch dev.

Description

  • Replace deprecated igraph functions by their more recent counterparts.
  • Handle reading empty commit interaction data explicitly, to avoid crashing in that case.
  • To make splitting networks by ranges more consistent with splitting time-based, also add range information to the return value.
  • Convert certain edge attributes of networks to list type. This change is motivated by a more stricter type enforcement in igraph version 2.0 and up. When merging networks through igraph::disjoint_union, edge attributes that exist in both networks are now assumed to have identical types. But, when we simplify a network, its edge attributes convert to list type as they now need to store data that a vector in R cannot store anymore, i.e., one edge attribute now has to hold several lists of data of all the edges that got merged into one. Therefore, we need to convert edge attributes to list by default to be able to merge simplified and unsimplified networks.

Changelog

Added

  • Introduce convert.edge.attributes.to.list function.

Fixed

  • Fix reading empty commit interaction data.

Changed / Improved

  • Replace deprecated igraph functions.
  • Add range information to the return value in split.network.time.based.by.ranges, to be consistent with split.data.time.based.
  • Convert edge attributes to a list format to ensure compatibility with igraph.

Copy link

codecov bot commented Dec 5, 2024

Codecov Report

Attention: Patch coverage is 92.85714% with 4 lines in your changes missing coverage. Please review.

Project coverage is 81.07%. Comparing base (1d1fe7f) to head (260af89).
Report is 1 commits behind head on dev.

Files with missing lines Patch % Lines
util-networks-metrics.R 0.00% 2 Missing ⚠️
util-networks.R 97.56% 1 Missing ⚠️
util-read.R 66.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##              dev     #274      +/-   ##
==========================================
+ Coverage   80.95%   81.07%   +0.12%     
==========================================
  Files          16       16              
  Lines        5051     5089      +38     
==========================================
+ Hits         4089     4126      +37     
- Misses        962      963       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

util-split.R Outdated Show resolved Hide resolved
maxloeffler and others added 3 commits December 7, 2024 17:06
Signed-off-by: Maximilian Löffler <[email protected]>
The interaction reading breaks if an interaction is empty in the data.
This commit adds a check for empty interactions.

Signed-off-by: Christian Hechtl <[email protected]>
Since version 2.0.0 of package `igraph`, the function `igraph::power_law_fit`
(and its deprecated alias `igraph::fit.power.law`) does not automatically
compute the p-value any more, which is needed by `metrics.scale.freeness`.
In version 2.1.1 of package `igraph`, they have added a new parameter
`p.value` to provide the possibility to enable p-value computation
again. In addition, they have added a parameter `p.precision` which
defaults to a precision of 0.01.

To make `metrics.scale.freeness` work again with the most recent version
of package `igraph`, the deprecated function calls to
`igraph::fit.power.law` are replaced by `igraph::power_law_fit` and are
adjusted to use the new parameter `p.value` to automatically compute the
p-value again. However, we do not make use of `p.precision` and rely on
the default precision instead.

Signed-off-by: Thomas Bock <[email protected]>
util-networks.R Outdated
Comment on lines 1289 to 1300
## Note: The following temporary fix only considers the 'date' attribute. However, this problem could also
## affect several other attributes, whose classes are not adjusted in our temporary fix.
## The following code block should be redundant as soon as igraph has fixed their bug.
u.actual.edge.attribute.date = igraph::edge_attr(u, "date")
if (!is.null(u.actual.edge.attribute.date)) {
if (is.list(u.actual.edge.attribute.date)) {
u.expected.edge.attribute.date = lapply(u.actual.edge.attribute.date, get.date.from.unix.timestamp)
} else {
u.expected.edge.attribute.date = get.date.from.unix.timestamp(u.actual.edge.attribute.date)
}
u = igraph::set_edge_attr(u, "date", value = u.expected.edge.attribute.date)
}
# u.actual.edge.attribute.date = do.call(base::c, igraph::edge_attr(u, "date"))
# if (!is.null(u.actual.edge.attribute.date)) {
# if (is.list(u.actual.edge.attribute.date)) {
# u.expected.edge.attribute.date = lapply(u.actual.edge.attribute.date, get.date.from.unix.timestamp)
# } else {
# u.expected.edge.attribute.date = get.date.from.unix.timestamp(u.actual.edge.attribute.date)
# }
# u = igraph::set_edge_attr(u, "date", value = as.list(u.actual.edge.attribute.date))
# }
Copy link
Collaborator

@bockthom bockthom Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two questions regarding this change:
(1) Why did you do changes to the commented code? (do.call(base::c as well as as.list(u.actual.edge.attribute.date) are new). Is this necessary to make the commented code run with recent igraph versions, or did you just try something without reverting it?

(2) The comment above is outdated if we keep the code commented. "The following code block should be redundant as soon as igraph has fixed their bug" ➡️ "The following commented code block is redundant as of igraph version 2.1.1, in which igraph has fixed their bug." (or something similar)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(1) The changes come from me experimenting with the fix before deciding that it must really be commented out all together to work with the new edge attributes. However, I did not yet delete it such that we can discuss here if we should try to make this whole changed edge attribute thing work with older igraph versions or if we can just drop support.

(2) That the comment is still there is just a symptom of (1).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on (a) whether you already have a version of this code that already works with older igraph versions and on (b) how much time it would take to make it work if it is not working yet.

If it is already working and only two lines need to be changed, we can discuss whether we support older igraph versions. If it would take a couple of hours to experiment with this code until it works with older igraph versions, I would suggest to cease the support of older igraph versions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a working version for older igraph versions, but I will try to test a bit and see if it's feasible.

Copy link
Contributor Author

@maxloeffler maxloeffler Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After installing any older igraph version, I get errors because the subgraph_from_edges method (which I used to replace the deprecated subgraph.edges method) is not exported in these versions. Looking at the rigraph repo I found that the method has only been exported recently. I don't think it would be good to still support older versions under this circumstance anyways.

Edited typos

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, subgraph_from_edges has been introduced recently. So, I guess, we should not support older igraph versions any more.
As we want to have a major release of coronet very soon, I guess it is ok to cease the support of older igraph versions. What we could think of is whether we want to provide a bug-fix branch for the previous release where we cherry-pick certain bug fixes to support older igraph versions - but I am not sure whether this is necessary and I am also not sure what's @hechtlC's opinion on that, so let's discuss this tomorrow in our meeting.

Copy link
Collaborator

@bockthom bockthom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While looking over the rest of this PR, I only found a couple of minor issues, please find the individual comments below.

Regarding the uncovered lines reported by codecov: I suggest to ignore them. 2 out of 4 missing lines are part of an untested module anyway. The other two lines refer to very special cases that have been previously untested too.

util-networks.R Outdated Show resolved Hide resolved
util-split.R Show resolved Hide resolved
tests/test-networks-equal-constructions.R Show resolved Hide resolved
util-networks.R Show resolved Hide resolved
tests/test-networks.R Outdated Show resolved Hide resolved
tests/test-networks.R Outdated Show resolved Hide resolved
This change makes splitting by ranges more consistent with time-based
splitting.

This works towards se-sic#273.

Signed-off-by: Maximilian Löffler <[email protected]>
Since igraph version 2.1, when joining networks using
'igraph::disjoint_union', edge attributes of the joining networks
require identical types. As simplifiying networks necessarily converts
types of edge attributes to list when merging edges, attributes now have
to be of type list by default.

Edge attributes that are explicitly considered during simplification
and, therefore, are not converted to lists are excluded from this rule.

This works towards fixing se-sic#271.

Signed-off-by: Maximilian Löffler <[email protected]>
Adjust the tests in accordance to converting edge attributes to list
type in the implementation.

This works towards fixing se-sic#271.

Signed-off-by: Maximilian Löffler <[email protected]>
This is necessary since igraph falsely fills in non-present edge
attributes with NULLs instead of NAs in certain cases when using
'igraph::disjoint_union' and 'igraph::add_edges'.

Signed-off-by: Maximilian Löffler <[email protected]>
'plyr::rbind.fill' uses NULL to fill missing values in lists. As we now
use lists for most edge attributes, we need to handle this case
separately to ensure missing values are filled with NAs instead.

To fix this issue, we need to instantiate missing columns in dataframes
with NAs before calling 'plyr::rbind.fill'. This operation is constant
with respect to the amount of rows and should not impact performance too
much.

This works towards fixing se-sic#271.

Signed-off-by: Maximilian Löffler <[email protected]>
This works towards fixing se-sic#271.

Signed-off-by: Maximilian Löffler <[email protected]>
Signed-off-by: Maximilian Löffler <[email protected]>
@maxloeffler maxloeffler changed the title Miscellaneous fixes and adjustments Change default representation of edge attributes from vectors to lists & other miscellaneous adjustments Dec 16, 2024
Copy link
Collaborator

@bockthom bockthom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've a few comments regarding the news. Please find them below.

- Deprecate support for R version 3.6 (PR #264, c8e6f45111e487fadbe7f0a13c7595eb23f3af6e, fb3f5474259d4a88f4ff545691cca9d1ccde90e3)
- Explicitly add R version 4.4 to the CI test pipeline (c8e6f45111e487fadbe7f0a13c7595eb23f3af6e)
- Refactor function `construct.edge.list.from.key.value.list` to be more readable (PR #263, 05c3bc09cb1d396fd59c34a88030cdca58fd04dd)
- Change the default representation of edge attributes from vectors to lists. This change is necessary for the interplay of coronet networks with certain igraph functionality since igraph version 2.1. (PR #274, 1c35d1fa2548deb297dbfa5e2b07fce31962c5b7, eda30b838369ec46376812298a3ea8159eec5789, 44c7b72e3234cb332bb2713fb408c124e67255d9, 7303eabef6a78198575fe5bdfc02813fde3d3974)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might think about marking this as a **Breaking Change**:, as we also did for other changes in earlier releases.
What do you think, is this a breaking change or not? So, is there a possibility that "old code" that relies on coronet breaks or that reading networks generated with an old version of igraph now breaks, that is, old networks are not any more compatible with how we treat networks now? Actually, I am not sure about that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How I see it is that coronet in itself is consistent. So every action you can perform with it should work just as it did before. While networks created with older coronet versions may not necessarily be compatible with networks created with this version, I don't know how this should ever happen, if not through serializing an old network and now deserializing it. Apart from that, there should be no incompatibilities.


### Changed/Improved

- Change the default value for the `issues.from.source` configuration parameter. Instead of reading JIRA and GitHub issues together, which was the previous default, the new default value causes only GitHub issue data to be read. To restore the previous default behavior and read data from both issue sources, this now needs to be manually configured when needed. (PR #264, 5ff83c364f6bfc1e6ff95e9c5f1087e031c48a5d, 8c8080cb9caf115f19d9f145ad6e6c108b131a67, 8bcbc81db521877908d2e5c2989082ed672f2a3b)
- Replace deprecated `igraph` functions by their preferred alternatives (PR #264, PR #268, 0df9d5bf6bafbb5d440f4c47db4ec901cf11f037, 7ac840d287a862eff61b1a84e194a4cba399f9e5)
- Replace deprecated `igraph` functions by their preferred alternatives (PR #264, PR #268, PR #274 0df9d5bf6bafbb5d440f4c47db4ec901cf11f037, 7ac840d287a862eff61b1a84e194a4cba399f9e5, e3617b8c6b21fb4242c1d392124813501069ca84, 4b0d5221dd56bb3c9ddf196f67719d4f503d9b61)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing comma after the inserted PR number.

Copy link
Collaborator

@bockthom bockthom Dec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And: Please also add commit 4b0d5221dd56bb3c9ddf196f67719d4f503d9b61 (in addition to the current mention) to the fixes section to state that the functions metrics.scale.freeness and metrics.is.scale.free now are compatible with old and new igraph versions again (see the commit message as well as the "Announcement" regarding our previous release in our NEWS.md; due to this announcement, we also need to state that this fixed now).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that still the case if we deprecate support for igraph versions below 2.1? I should probably add that somewhere in the NEWS.md I just haven't found a good place for it.


### Fixed

- Fix the creation of edgelists for issue-based artifact-networks by correctly iterating over the issue data (PR #264, 321d85043112971c04998249c14a0677a32c9004)
- Fix a bug in `extract.timestamps` that occurs when the first `data.source` contains empty data and that leads to a return value of type numeric which should be POSIXct (PR #270, 10696e4cf4ae92371917ed8ccaec2b0183da145c, 646c01a42ad8decfbc9040030e790e51cb65cffd)
- Fix `read.commit.interactions` by explicitly considering empty commit interaction data (PR #274, f591528a0f1f11b1a4390949ab770f3f74a766f9)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

empty commit interaction data ➡️ non-existent commit interactions

(for explanation: the commit-interaction data per se are present, but specific interactions between specific commits are empty, that is, non-existent)

@@ -834,6 +834,11 @@ split.network.time.based.by.ranges = function(network, ranges, remove.isolates =
}
)

# add range information
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While checking the news, I spotted that here the second # of ## is missing. But there is no need to adjust the old commit (as this would need an update of all commit hashes) – you are allowed to fix this in the commit that updates the news.

@@ -15,19 +15,22 @@
- Add commit network as a new type of network. It uses commits as vertices and connects them either via cochange or commit interactions. This includes adding new config parameters and the function `add.vertex.attribute.commit.network` for adding vertex attributes to a commit network (PR #263, ab73271781e8e9a0715f784936df4b371d64c338, ab73271781e8e9a0715f784936df4b371d64c338, cd9a930fcb54ff465c2a5a7c43cfe82ac15c134d)
- Add `remove.duplicate.edges` function that takes a network as input and conflates identical edges (PR #268, d9a4be417b340812b744f59398ba6460ba527e1c, 0c2f47c4fea6f5f2f582c0259f8cf23af985058a, c6e90dd9cb462232563f753f414da14a24b392a3)
- Add `cumulative` as an argument to `construct.ranges` which enables the creation of cumulative ranges from given revisions (PR #268, a135f6bb6f83ccb03ae27c735c2700fccc1ee0c8, 8ec207f1e306ef6a641fb0205a9982fa89c7e0d9)
- Add range information to network-splits when splitting a network using `split.network.time.based.by.ranges` (PR #274, 87911ade231c44b93be194a1d6734f7de043a4af)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this to the ## Fixed section, as this is actually a bug fix. And please also mention that this change also affects the function split.networks.time.based.

@bockthom bockthom added this to the v5.0 milestone Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants