-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add sub-unf02 and sub-unf03 datasets #7
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not a big fan of duplicating those datasets under names XX02 and XX03, because according to the BIDS convention, which the lab follows, this implies 02 and 03 are two other subjects, whereas in reality they are the same subject (duplicated). So, if we decide to explicitly store unf-02 and unf-03, i would instead upload the actual unf-02 and unf-03 datasets (which actually exist)
@jcohenadad Oh I see, I didn't realize that. If I wanted to use the actual |
https://github.com/spine-generic/data-multi-subject#spine-generic-public-database-multi-subject 😊 |
@jcohenadad Okay, I have used the data for |
are you sure you committed the physical file? (not the symlink from the annexed file: /annex/objects/SHA256E-s139692--53de14288ff8d1211cd55c63f426e27f7122a155e451c930a7f9a233e4ebe036.nii.gz) ? |
Oh thanks, I missed the derivatives folder! Fixed here: 92ed86c |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah... we're hitting another problem here: the size. Data for unf-01 were modified by reducing their size, see details here #2
the reason is to have a dataset as light as possible, because it gets downloaded each time we need to run CI. Ideally the testing dataset should only be a few MB...
@ahill187 about e4063c3, why not 1mm iso as specified in #2 (comment)? |
@jcohenadad No reason in particular, I looked at the file sizes in
Will it make a difference? Sorry I'm not so familiar with these files yet :) |
I think the first question is: what do we need these extra data for-- if we need them only for "crash test", then the content doesn't really matter, and we can squeeze the size even more-- if content matters, then the other question is: how many data and what data do we need for integrity testing? if we only need one subject (unf01), then we can squeeze unf2 and unf3 much more-- however if we need "information" in more than unf01, then we should keep some content-- @lrouhier @andreanne-lemay @ivadomed/editors could you pls advise? thanks |
I believe some of the tests rely on the extra data, although to what extent I am unsure. In
|
The content of the subject doesn't matter (for now) for the test we are doing. We need at least 3 subjects to have one for each set: testing, validation, and training. I personally would go with adding 2 subjects in the testing data with minimal size. This way we don't need a test to duplicate the subjects which is not very clean. |
Okay cool! I think that is what we have implemented in this pull request currently. @jcohenadad what are your thoughts? |
Could you please try running ivadomed's testing framework on this testing-data branch to see if it passes? |
Yes, I've run it locally in Python 3.8 and it passes, so long as we remove the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your contribution @ahill187 🙏
Please make sure to click "squash and merge" (to make sure we don't accumulate big files on the main branch)
Currently, during testing of the
ivadomed
package, the script calledtest_script.py
makes two copies of thesub-unf01
test data. However this raises two issues:test_script.py
script hasn't run yet.This pull request saves these copies in the testing data so they do not have to be recreated.