Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tests for splitting with equal sized windows, restructure tests #222

Merged
merged 15 commits into from
May 7, 2022

Conversation

joba00002
Copy link
Contributor

@joba00002 joba00002 commented Mar 14, 2022

Addresses issue #141
Additionally, test-split.R has been restructured

Signed-off-by: Jonathan Baumann [email protected]

Prerequisites

  • I adhere to the coding conventions (described here) in my code.
  • I have updated the copyright headers of the files I have modified.
  • I have written appropriate commit messages, i.e., I have recorded the goal, the need, the needed changes, and the location of my code modifications for each commit. This includes also, e.g., referencing to relevant issues.
  • I have put signed-off tags in all commits.
  • I have updated the changelog file NEWS.md appropriately.
  • I have checked whether I need to adjust the showcase file showcase.R with respect to my changes.
  • The pull request is opened against the branch dev.

Description

Changelog

@joba00002 joba00002 changed the title Add test-split-equal-sized-windows.R Add tests for splitting with equal sized windows, restructure tests Apr 14, 2022
@joba00002
Copy link
Contributor Author

The length in the splitting info used to be set incorrectly when splitting based on a number of windows, I fixed this a few weeks ago.

@bockthom We said that stating the length in seconds, as is the default (and current) behaviour, is not ideal and we would rather have an approximation that is easier to read.
However, I have not been able to find any function that would achieve this.

@joba00002
Copy link
Contributor Author

I also stumbled across this in test-split.R:

##
## TODO
##

## - net.conf$update.values(list(pasta = TRUE, synchronicity = TRUE))

It seems to me that this indicates something that still needs to be tested. Is there a corresponding issue?

@bockthom
Copy link
Collaborator

@bockthom We said that stating the length in seconds, as is the default (and current) behaviour, is not ideal and we would rather have an approximation that is easier to read. However, I have not been able to find any function that would achieve this.

Just a short answer for documentation reasons:
We would like to have a duration format which is both easily human readable and also reusable (e.g., passing the splitting length to another function which takes a duration as input parameter). This can be achieved by calling as.period on a duration object (since the string representation of the duration object itself cannot be reused as treating it as a duration ends up in NA).

@bockthom
Copy link
Collaborator

I also stumbled across this in test-split.R:

##
## TODO
##

## - net.conf$update.values(list(pasta = TRUE, synchronicity = TRUE))

It seems to me that this indicates something that still needs to be tested. Is there a corresponding issue?

No, there is no corresponding issue. But, this indeed still needs to be tested. Should be easily testable via parameterized test for tests which already check for pasta and synchronicity data (hopefully...).

Copy link
Collaborator

@bockthom bockthom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new structure of the tests looks good, it is indeed more clear than it was before, thanks for restructuring @joba00002!

However, I found a few inconsistencies, please have a look at my individual comments below.

In addition, I am wondering if we have lost some of the tests?
If I look at the current dev branch and grep for test_that(, we have 18 tests in test-split and 11 tests in test-split-sliding-window, in sum 29 tests. However, if I grep on the four new test-split- files, I get 33 tests. However, there are 5 new tests on "equal sized windows" and 3 new tests due to separating "number window" and "activity amount" in 'split-network-activity-based', if I have counted correctly. So, in sum, we get 33 - 8 = 25 < 29. So, I am wondering whether we miss 4 tests or whether there is something wrong in my counting method. We definitively should make sure that we still have all the tests we had before.

tests/test-split-data-activity-based.R Outdated Show resolved Hide resolved
tests/test-split-data-time-based.R Outdated Show resolved Hide resolved
tests/test-split-data-time-based.R Outdated Show resolved Hide resolved
tests/test-split-network-time-based.R Outdated Show resolved Hide resolved
tests/test-split-data-activity-based.R Outdated Show resolved Hide resolved
tests/test-split-data-activity-based.R Outdated Show resolved Hide resolved
tests/test-split-data-time-based.R Outdated Show resolved Hide resolved
@joba00002
Copy link
Contributor Author

So, I am wondering whether we miss 4 tests

No, I don't think so. There are 4 tests still in test-split.R, because they did not make sense to move to any other files (although we could consider renaming this file).
If I understand you correctly, you did not count these 4.

@bockthom
Copy link
Collaborator

There are 4 tests still in test-split.R, because they did not make sense to move to any other files (although we could consider renaming this file). If I understand you correctly, you did not count these 4.

You are right. I assumed that test-split.R is empty now, but it actually is not. I would appreciate it if we could find another name for this file. At least, 3 of the 4 tests deal with data and networks (if have identified that correctly), so we could think about something like test-split-data-and-networks.R or something similar. However, I am not sure if the 4th test also is in line with such a name. In addition, the comment at the beginning of test-split about "splitting raw data" also seems to be outdated or somehow misleading, maybe we can also rephrase that one.

@bockthom
Copy link
Collaborator

@joba00002 I did not have a look at your most recent commits yet, but I just saw that your pull request seems to be based on the wrong branch, as there are 14 merge commits appearing in this PR which originate from version releases on the master branch but do not appear on the dev branch. So, please base your changes onto the dev branch, not onto the master branch. Thanks!

Add tests for split.data.time.based, split.network.time.based and
split.networks.time.based with the number.windows parameter.

Signed-off-by: Jonathan Baumann <[email protected]>
Move time period calculation to new function get.time.period.by.amount,
which is called by generate.date.sequence for splitting and by
split.data.time.based to set the splitting information correctly.
Adjust tests accordingly.

Signed-off-by: Jonathan Baumann <[email protected]>
In test-split-equal-sized-windows.R, add explicit checks of the
splitting info in the test for split.networks.time.based.

Signed-off-by: Jonathan Baumann <[email protected]>
In test-split-equal-sized-windows.R, use get.date.from.string from
util-misc instead of calling as.POSIXct directly for consistency.

Signed-off-by: Jonathan Baumann <[email protected]>
Move tests for split.data.time.based, split.data.activity.based,
split.network.time.based and split.network.activity.based to new,
respective files for each.
In the process, move all tests from test-split-sliding-windows.R and
test-split-equal-sized-windows.R to the proper files also, to have a
consistent structure throughout.

Signed-off-by: Jonathan Baumann <[email protected]>
Splitting by activity amount and splitting by number of windows are now
seperate tests instead of one large one.

Signed-off-by: Jonathan Baumann <[email protected]>
Converting from duration to time period yields a format that is human
readable, exact and can be converted back to a duration.

Signed-off-by: Jonathan Baummann <[email protected]>
Fix inconsistencies and improve comments and test names in test-split.R,
test-split-data-activity-based.R, test-split-data-time-based.R,
test-split-network-activity-based.R and test-split-network-time-based.R.

Signed-off-by: Jonathan Baumann <[email protected]>
Affects tests for splitting data activity based with number.windows.
Previously, tests for ignoring sliding.window in this case were included
in the tests for sliding.window, instead, they are now parameterized
tests in the number.windows category.

Signed-off-by: Jonathan Baumann <[email protected]>
Now that many tests have been moved to new files, update the copyright
information so that each developer is listed in the files with functions
they worked on.

Signed-off-by: Jonathan Baumann <[email protected]>
@joba00002
Copy link
Contributor Author

@bockthom I must have picked the wrong branch to rebase onto. Should be fixed now.

@bockthom
Copy link
Collaborator

bockthom commented Apr 29, 2022

Should be fixed now.

Yes, thanks!

Is this PR already complete or are you planning to add further commits?
If it is complete (or you think that it may be complete 😉), then we will review it. But as it is still marked as a "Draft PR", we will not automatically review it unless you ask us to do so or convert to a non-draft PR.

@joba00002
Copy link
Contributor Author

I know that the changelog file still needs to be updated, but I'll do after a review.

I originally also wanted to add tests for the TODO item I found (pasta and synchronicity data), but I think it makes more sense to do that in another PR. I will mark this one as ready now.

@joba00002 joba00002 marked this pull request as ready for review April 29, 2022 13:53
hechtlC
hechtlC previously approved these changes May 2, 2022
Copy link
Contributor

@hechtlC hechtlC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thank you for the nice additions and cleanups @joba00002!

As for renaming the test-split.R file: I would suggest something like test-split-general.R or test-split-misc.R. These are not the best but the tests in here can not sensibly be grouped, except for the miscellaneous group. What do you think @bockthom and @joba00002?

Other than that the pull request can be merged once @bockthom approves.

Since most tests are now in other files, this file being called
test-split.R was somewhat confusing.

Signed-off-by: Jonathan Baumann <[email protected]>
List changes to util-split.R under "Fix".

Signed-off-by: Jonathan Baumann <[email protected]>
Copy link
Collaborator

@bockthom bockthom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for renaming, I agree with the the new name of the former test-split.R 😄

I started reviewing this PR, but unfortunately, I was not able to finish my review yet.
But I already stumbled over two issue sI would already like to mention:

(1) The test pipeline fails in two of your new tests on equally sized windows (for R version 3.4) as you can see in the drone CI pipeline. According to the logs, there is used a expect_identical where the compared objects are not identical but equal. I was not able to locate which tests and particular checks are meant, but could you please check in the drone logs and check whether replacing expect_identical by expect_equal at the two particular lines would make sense and potentially fix that problem? Maybe we can also discuss this tomorrow. If it does not make sense to replace these two statements, we might just ignore it and maybe we will stop testing R 3.4... but let's see...

(2) I spotted one code snippet where the indentation looks a bit weird. Please have a look at the comment below.

I try to continue my review tomorrow before our meeting, but I don't expect to spot additional issues. But we'll see...

util-split.R Outdated Show resolved Hide resolved
joba00002 added 2 commits May 3, 2022 21:01
Fix indentation in util-split.R

Signed-off-by: Jonathan Baumann <[email protected]>
Fix copyright headers in test-split-*.R

Signed-off-by: Jonathan Baumann <[email protected]>
@joba00002 joba00002 force-pushed the eq_windows_tests branch from efabf57 to 0070c10 Compare May 6, 2022 20:30
"Split a list of networks time-based with equal-sized windows" fails on
R3.4 when using expect_identical.
Therefore, use expect_equal instead.

Signed-off-by: Jonathan Baumann <[email protected]>
@joba00002
Copy link
Contributor Author

expect_equal seems to work.
I suspect the issue is something with comparing POSIXct values, but I don't understand what the difference is, since the documentation of test_that does not go into detail.

@bockthom
Copy link
Collaborator

bockthom commented May 7, 2022

expect_equal seems to work. I suspect the issue is something with comparing POSIXct values, but I don't understand what the difference is, since the documentation of test_that does not go into detail.

Thanks for investigating this problem. I also had a look at the test logs (where you printed the structure of the objects to be compared) and further search for potential causes for the test failure, but I also couldn't make out anything. I also checked the R release notes for changes between R 3.4 and R 3.5 to find changes related to our problem. Between R 3.4 and R 3.5, they have indeed changed something regarding POSIXct and also regarding functions of S4 classes which are potentially used by testthat. So, it could be the case that expect_identical works on more recent R versions due to these changes.

Anyway, we are fine with expect_equal here. So I will merge right away (and if we will decide to not support R 3.4 anymore somewhere in the future, we can easily find and adjust the one statement due to your comment mentioning R 3.4 😄 )

@bockthom bockthom merged commit 38e815e into se-sic:dev May 7, 2022
@joba00002 joba00002 deleted the eq_windows_tests branch May 7, 2022 13:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants