Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update deprecated function calls from other packages #260

Open
bockthom opened this issue Apr 24, 2024 · 15 comments
Open

Update deprecated function calls from other packages #260

bockthom opened this issue Apr 24, 2024 · 15 comments

Comments

@bockthom
Copy link
Collaborator

bockthom commented Apr 24, 2024

Description

Some of the functions that we call from other packages are marked as deprecated.
Especially since igraph release 2.0.0, igraph has marked many functions as deprecated. In most cases, the functions have just been renamed. To provide one example, igraph::get.edgelist(...) has been renamed to igraph::as.edgelist(...).
As I assume that, at some point, these functions will become defunct, we should think about renaming them.

We can use this issue to collect deprecated function calls here, if necessary.

In some rare cases, also our contribution guide could be affected by that (e.g., that we recommend/enforce deprecated function calls in our contribution guide.) Thus, we should also check our contribution guide for that when we address this issue.

@bockthom bockthom added this to the Future milestone Apr 24, 2024
@bockthom bockthom modified the milestones: Future, v4.5 May 15, 2024
@maxloeffler
Copy link
Contributor

maxloeffler commented May 27, 2024

I have conducted an analysis of the warning messages I retrieved (R: 4.1.2 on my Ubuntu VM). In total I classified 66 warnings into 7 classes:

  1. "can't open file" (fixable: mixed)
  2. "data length is not a multiple of split variable" (fixable: yes!) This warning actually stems from a "bug" in the implementation of the add.link <-> referenced.by edge list construction. There, we have a dataframe containing all add.link events called add.links and we want to mclapply over it. I used the split(add.links, ...) function for that and that function usually works, that is, if you supply a second parameter that is a vector from 1 to the length of the first parameter. However, to get this vector I used seq_along(add.links), which does return a vector from 1 to 17 in every case since we have 17 columns in an issue event. Therefore, we get the warning that the length of the second parameter is not in line with the length of the first parameter. I experimented a bit and was able to this by replacing the second parameter with seq_len(nrow(add.links)) or seq.int(nrow(add.links)), alternatively.
  3. "All formats failed to parse. No formats found." (fixable: no, this is triggered by a test with an explicitly empty network. However, having this warning is senseful and we should not suppress it) There is an issue with project.data$get.data.timestamps(outermost = TRUE) in util-split.R:427. Sometimes this call does return a vector where start is Inf and end is -Inf. A consecutive get.date.from.string call then throws this issue, since Inf and -Inf are not in the correct format for conversion.
  4. "no non-missing arguments to min; returning Inf", (fixable: same as 3) This is related to point 3. This warnings is being thrown inside the get.date.timestamps call. There reason for this is that the date.timestamps are empty
    This is because the test these issues come from is a test that operates on an empty network.
  5. "All formats failed to parse. No formats found." 2.0. (fixable: no, this is triggered by tests that explicitly test incorrect bins. These warnings are complicated to remove and serve a purpose) This issue comes from splitting by invalid bins time based (the test splits explicitly wants to split by invalid bins) (these bins are not in correct date format)
  6. "All formats failed to parse. No formats found." 3.0 (fixable: same as 5) Same as above just splitting by bins instead
  7. "igraph deprecation warnings" (fixable: yes) These should be easy to fix

The reason I put fixable: unsure for some warning categories is that it would be technically possible to catch the reasons for these warnings and then throw our own warnings instead but I don't think it is worth it. I will of course address the fixable ones in the PR for this issue :)

Edit: update fixability

@maxloeffler
Copy link
Contributor

maxloeffler commented Jun 11, 2024

I have had a look at all the "cannot open file" warnings. Contrary to your intuition, none of them originate from JIRA stuff. Here is a list of sources:

  1. "nonexistent.file", The test describes this case as "Empty timestamps from invalid file". We could only fix that by creating an "invalid file" that does exist but is empty or malformated. This will then likely cause other warnings.
  2. "reading empty commit interactions", This test is labeled with "Read the empty commit-interactions data.". Same as above.
  3. "no bots", There are some tests that call get.sample.network(). This method builds an example network that includes authors as vertices. When reading authors, we also try to read the bots.list file which does not exist in this case.
  4. "comparison", In one test, we compare two networks, one with the commit.interaction parameter set one without. When setting the parameter to TRUE, we try to read the interactions from a file that does not exist in this case. It is also not needed, we only care about the parameter.

In summary, none of these warnings come from JIRA. None of these warnings are easily fixable using conventional approaches, as they all read from files that are just not present in the demo data. All of these tests could be fitted with suppressWarnings calls, however, im not sure if that is intended.

@maxloeffler
Copy link
Contributor

I also had a look at all warnings across the different R versions we support and found that the warnings are consistent and do not change over versions, i.e., all R versions display the same amount of warnings with the same content. Also I have noticed, that the CI runners produce one additional warning that does not come when running it on my machine, for which the reason is, that the CI runners do not support UI and therefore Tk is not available. I think we can disregard this warning savely.

@bockthom
Copy link
Collaborator Author

bockthom commented Jun 12, 2024

I have had a look at all the "cannot open file" warnings. Contrary to your intuition, none of them originate from JIRA stuff. [...]

Ok, then we are talking about non-existing files that are actually expected to be non-existing in the test. In these cases, we should keep the warnings and not try to avoid them.

In summary, none of these warnings come from JIRA. None of these warnings are easily fixable using conventional approaches, as they all read from files that are just not present in the demo data. All of these tests could be fitted with suppressWarnings calls, however, im not sure if that is intended.

No, we should not suppress any warnings there.
Regarding JIRA: As the test data contain JIRA data, the default configuration does not end up in warnings regarding missing JIRA data... Nevertheless, as mentioned by @hechtlC last week, in practice, we usually have only GitHub or JIRA data but not both of them in parallel. Therefore, we still should change the default to just "GitHub" to avoid tens or hundreds of warnings during practical use of coronet.

@maxloeffler
Copy link
Contributor

Oh okay, then I just missed the point there. I will update the default then 🤓

@maxloeffler
Copy link
Contributor

I have had a look at some of the tests that fail because of the sorting of the c("jira", "github") list. I found that at least in the splitting related tests that the order of sources does actually mess a lot with the order of events. The reason that I see for this is in read.issues. There the sources are being read from file then rbinded together. So when jira data is read first, it is above the github data and vice versa. But then - later - the dataframe is sorted by date attribute which causes jira and github issues to be interwoven and depending on the order of readin, the events will obtain different ids (indexes). In all splitting tests, we do comparison by doing something like

data$issues[rownames(data$issues) %in% c(18, 24:26, 44:47), ],

Therefore, all splitting tests would require thoughtful fixing when we decide to follow that route. Honestly, I think it is not tooooo much work, a couple of hours though. 😅

@bockthom
Copy link
Collaborator Author

Hm, that's bad. If it would just be reordering, it should not be that a big problem. But fixing the indices is a tedious and potentially error-prone task, I certainly understand that.

What would be the alternatives? Do you have any other ideas how we can circumvent this problem? But even if there are alternatives, we should not tweak the tests just to make them align with outdated and wrong design choices. In the end, the tests should mimic how our actual data processing is assumed to work in practice.

So, before we take a final decision, could you please provide us with some details which/how many tests are affected?
I'd guess that all the splitting tests are affected, maybe also many tests of the data module that deal with reading/accessing issue data. But is there anything else which also breaks? What about network construction, for instance? Just to get an overview.

@maxloeffler
Copy link
Contributor

While working on the showcase.R recently, I discovered two more deprecated igraph functions that only occur in util-plot.R. How should be handle those? Thematically, this fits in the (closed and merged) PR for this issue. However, this warning does not annoy anyone using coronet as a library

@bockthom
Copy link
Collaborator Author

bockthom commented Aug 6, 2024

Could you please document which warnings you are talking about?
In general, it does not matter whether they annoy anyone or not (this can only be judged by the end user, anyway) – if the warning points to a deprecated function call, this is a hint that we should update the function call, as deprecated functions usually will become dysfunct at some later point in time.

I suggest to add a dedicated commit to the PR at which you are currently working and reference this issue in the commit message - then the commit is also referenced in this issue - but please avoid the words 'fix(ed)' or 'close(d)' in the commit message, to not automatically close this issue. Even after the next release of coronet, I suggest not to close this issue to keep it open for documenting future deprecation calls, as this issue will still be of ongoing relevance.

@maxloeffler
Copy link
Contributor

Concretely, im talking about igraph::which_loop (formally igraph::is.loop) and igraph::graph_attr_names (formally igraph::list.graph.attributes. Both appear only once in util-plot.R.

Okay, that sounds reasonable, I'll add the commit soon.

maxloeffler added a commit to maxloeffler/coronet that referenced this issue Aug 27, 2024
maxloeffler added a commit to maxloeffler/coronet that referenced this issue Aug 27, 2024
maxloeffler added a commit to maxloeffler/coronet that referenced this issue Sep 2, 2024
@bockthom
Copy link
Collaborator Author

Yesterday, a new igraph version has been released on CRAN. In the new release, again a number of functions has been marked as deprecated. Therefore, we should check if there are new deprecation warnings in coronet and should update the corresponding function calls.

@maxloeffler
Copy link
Contributor

Yes indeed, there are new deprecation warnings. Do you want me to integrate the bump commits to the currently open PR?

@bockthom
Copy link
Collaborator Author

Do you want me to integrate the bump commits to the currently open PR?

It depends. If there are only few changes, we can certainly add them to the currently open PR. If there is quite a number of changes, I would appreciate to create a new PR.

@maxloeffler
Copy link
Contributor

I updated 3 freshly deprecated functions as well as 3 that were deprecated in 2.0.0 that slipped through previous efforts somehow. In total the patch includes changes to 10 files including implementation and testing files. Therefore, I might open a new PR.

@maxloeffler
Copy link
Contributor

Unfortunately, somewhat about their new version breaks in our usage of igraph::disjoint_union (which worked prior to updating igraph). I researched a bit and found that the breaking change was introduced 4 months ago in this commit. I have not yet figured out a way to fix that issue.

Concrete error description

Error Message:

── Error (test-networks.R:404:5): Extraction of sub-networks ───────────────────
<vctrs_error_ptype2/vctrs_error_incompatible_type/vctrs_error_incompatible/vctrs_error/rlang_error/error/condition>
Error in `vctrs::vec_c(attr[[exattr[a]]], ea[[exattr[a]]])`: Can't combine `..1` <list> and `..2` <character>.
Backtrace:

  1. ├─global get.sample.network() at test-networks.R:404:5
  2. │ └─net.builder$get.multi.network() at util-networks.R:2179:5
  3. │   └─igraph::disjoint_union(authors.net, artifacts.net) at util-networks.R:1280:13
  4. │     └─vctrs::vec_c(attr[[exattr[a]]], ea[[exattr[a]]])
  5. └─vctrs (local) `<fn>`()
  6.   └─vctrs::vec_default_ptype2(...)
  7.     ├─base::withRestarts(...)
  8.     │ └─base (local) withOneRestart(expr, restarts[[1L]])
  9.     │   └─base (local) doWithOneRestart(return(expr), restart)
 10.     └─vctrs::stop_incompatible_type(...)
 11.       └─vctrs:::stop_incompatible(...)
 12.         └─vctrs:::stop_vctrs(...)
 13.           └─rlang::abort(message, class = c(class, "vctrs_error"), ..., call = call)

When breaking the variables are instanciated as follows:

Browse[1]> attr
$date
$date[[1]]
[1] "2004-10-09 18:38:13 UTC" "2005-02-09 18:49:49 UTC"

$date[[2]]
[1] "2010-07-12 10:05:36 UTC" "2010-07-12 11:05:35 UTC"

$date[[3]]
[1] "2010-07-12 12:05:34 UTC" "2010-07-12 12:05:40 UTC"

$date[[4]]
[1] "2010-07-12 12:05:41 UTC" "2010-07-12 12:05:42 UTC"


$artifact.type
$artifact.type[[1]]
[1] "Mail" "Mail"

$artifact.type[[2]]
[1] "Mail" "Mail"

$artifact.type[[3]]
[1] "Mail" "Mail"

$artifact.type[[4]]
[1] "Mail" "Mail"


$message.id
$message.id[[1]]
[1] "<[email protected]>" "<[email protected]>"

$message.id[[2]]
[1] "<[email protected]>" "<[email protected]>"

$message.id[[3]]
[1] "<[email protected]>" "<[email protected]>"

$message.id[[4]]
[1] "<[email protected]>" "<[email protected]>"


$thread
$thread[[1]]
[1] "<thread-1>" "<thread-1>"

$thread[[2]]
[1] "<thread-2>" "<thread-2>"

$thread[[3]]
[1] "<thread-3>" "<thread-3>"

$thread[[4]]
[1] "<thread-4>" "<thread-4>"


$weight
[1] 2 2 2 2 1 1 1 1 1

$type
[1] "Unipartite" "Unipartite" "Unipartite" "Unipartite"

$relation
[1] "mail" "mail" "mail" "mail"

Browse[1]> ea
$weight
[1] 1 1 1 1 1

$artifact.type
[1] "Feature" "Feature" "Feature" "Feature" "Feature"

$type
[1] "Unipartite" "Unipartite" "Unipartite" "Unipartite" "Unipartite"

$relation
[1] "callgraph" "callgraph" "callgraph" "callgraph" "callgraph"

Browse[1]> a
[1] 2

Leo-Send pushed a commit to Leo-Send/coronet that referenced this issue Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants