Add sub-unf02 and sub-unf03 datasets #7

ahill187 · 2021-01-05T00:28:38Z

Currently, during testing of the ivadomed package, the script called test_script.py makes two copies of the sub-unf01 test data. However this raises two issues:

That script isn't always run first, so some of the tests are reliant on these additional datasets and fail if the test_script.py script hasn't run yet.
Simplicity. I think the tradeoff for downloading a bit of extra data is probably better than making copies of test data during testing; it makes the code simpler to follow, and minimizes room for error.

This pull request saves these copies in the testing data so they do not have to be recreated.

Note: The last merge into master of ivadomed is failing (https://github.com/ivadomed/ivadomed/actions/runs/461790903) due to this issue

jcohenadad

i'm not a big fan of duplicating those datasets under names XX02 and XX03, because according to the BIDS convention, which the lab follows, this implies 02 and 03 are two other subjects, whereas in reality they are the same subject (duplicated). So, if we decide to explicitly store unf-02 and unf-03, i would instead upload the actual unf-02 and unf-03 datasets (which actually exist)

ahill187 · 2021-01-05T01:36:40Z

@jcohenadad Oh I see, I didn't realize that. If I wanted to use the actual unf-02 and unf-03 datasets, where would I find them?

jcohenadad · 2021-01-05T01:40:01Z

@jcohenadad Oh I see, I didn't realize that. If I wanted to use the actual unf-02 and unf-03 datasets, where would I find them?

https://github.com/spine-generic/data-multi-subject#spine-generic-public-database-multi-subject 😊

ahill187 · 2021-01-05T22:38:36Z

@jcohenadad Okay, I have used the data for sub-unf02 and sub-unf03, from here: https://github.com/spine-generic/data-multi-subject, so they are no longer copies. Should be fixed in this commit: a83212e.

jcohenadad · 2021-01-05T22:45:38Z

@jcohenadad Okay, I have used the data for sub-unf02 and sub-unf03, from here: https://github.com/spine-generic/data-multi-subject, so they are no longer copies. Should be fixed in this commit: a83212e.

are you sure you committed the physical file? (not the symlink from the annexed file: /annex/objects/SHA256E-s139692--53de14288ff8d1211cd55c63f426e27f7122a155e451c930a7f9a233e4ebe036.nii.gz) ?

ahill187 · 2021-01-05T23:03:01Z

are you sure you committed the physical file? (not the symlink from the annexed file: /annex/objects/SHA256E-s139692--53de14288ff8d1211cd55c63f426e27f7122a155e451c930a7f9a233e4ebe036.nii.gz) ?

Oh thanks, I missed the derivatives folder! Fixed here: 92ed86c

df_ref.csv

jcohenadad

ah... we're hitting another problem here: the size. Data for unf-01 were modified by reducing their size, see details here #2

the reason is to have a dataset as light as possible, because it gets downloaded each time we need to run CI. Ideally the testing dataset should only be a few MB...

jcohenadad · 2021-01-07T21:55:07Z

@ahill187 about e4063c3, why not 1mm iso as specified in #2 (comment)?

ahill187 · 2021-01-08T17:27:41Z

@jcohenadad No reason in particular, I looked at the file sizes in sub-unf02 and sub-unf03 and compared them with sub-unf01, and determined that if I downsampled by a factor of 1/10 I would achieve about the same file size.
I used the command:

sct_resample -i sub-unf03_T2w_labels-disc-manual.nii.gz -f 0.1x0.1x0.1

Will it make a difference? Sorry I'm not so familiar with these files yet :)

jcohenadad · 2021-01-08T19:54:15Z

I think the first question is: what do we need these extra data for-- if we need them only for "crash test", then the content doesn't really matter, and we can squeeze the size even more-- if content matters, then the other question is: how many data and what data do we need for integrity testing? if we only need one subject (unf01), then we can squeeze unf2 and unf3 much more-- however if we need "information" in more than unf01, then we should keep some content-- @lrouhier @andreanne-lemay @ivadomed/editors could you pls advise? thanks

ahill187 · 2021-01-08T21:34:41Z

I believe some of the tests rely on the extra data, although to what extent I am unsure. In test_script.py, these extra copies are created, presumably to be used for some of the tests? Perhaps someone else might have a better idea though.

def test_creation_dataset():
    # Add new file as needed (no empty test/validation)
    # create empty directory for our new files
    os.makedirs("testing_data/sub-test002/anat/", exist_ok=True)
    os.makedirs("testing_data/sub-test003/anat/", exist_ok=True)
    os.makedirs("testing_data/derivatives/labels/sub-test002/anat/", exist_ok=True)
    os.makedirs("testing_data/derivatives/labels/sub-test003/anat/", exist_ok=True)

    # sub-test002 and sub-test003 will just be copy of our only real testing subject
    command = "cp testing_data/sub-unf01/anat/sub-unf01_T2w.nii.gz testing_data/sub-test002/anat/sub-test002" + \
              "_T2w.nii.gz"
    subprocess.check_output(command, shell=True)

    command = "cp testing_data/sub-unf01/anat/sub-unf01_T2w.nii.gz testing_data/sub-test003/anat/sub-test003" + \
              "_T2w.nii.gz"
    subprocess.check_output(command, shell=True)

    # populate derivatives for sub-test002
    derivatives = "testing_data/derivatives/labels/"
    command = "cp " + derivatives + "sub-unf01/anat/sub-unf01_T2w_seg-manual.nii.gz " + \
              derivatives + "sub-test002/anat/sub-test002" + \
              "_T2w_seg-manual.nii.gz"
    subprocess.check_output(command, shell=True)

    command = "cp " + derivatives + "sub-unf01/anat/sub-unf01_T2w_lesion-manual.nii.gz " + \
              derivatives + "sub-test002/anat/sub-test002" + \
              "_T2w_lesion-manual.nii.gz"
    subprocess.check_output(command, shell=True)

    # populate derivatives for sub-test003
    command = "cp " + derivatives + "sub-unf01/anat/sub-unf01_T2w_seg-manual.nii.gz " + \
              derivatives + "sub-test003/anat/sub-test003" + \
              "_T2w_seg-manual.nii.gz"
    subprocess.check_output(command, shell=True)

    command = "cp " + derivatives + "sub-unf01/anat/sub-unf01_T2w_lesion-manual.nii.gz " + \
              derivatives + "sub-test003/anat/sub-test003" + \
              "_T2w_lesion-manual.nii.gz"
    subprocess.check_output(command, shell=True)

    # Model needs to be inside the log_directory since we use a config file.
    command = "cp testing_data/model_unet_test.pt testing_script/best_model.pt"
    subprocess.check_output(command, shell=True)

    list1 = ["sub-test002", "-"]
    list2 = ["sub-test003", "-"]

    # add subjects to participants.tsv
    append_list_as_row("testing_data/participants.tsv", list1)
    append_list_as_row("testing_data/participants.tsv", list2)

andreanne-lemay · 2021-01-10T21:52:28Z

The content of the subject doesn't matter (for now) for the test we are doing. We need at least 3 subjects to have one for each set: testing, validation, and training. I personally would go with adding 2 subjects in the testing data with minimal size. This way we don't need a test to duplicate the subjects which is not very clean.

ahill187 · 2021-01-12T18:40:44Z

Okay cool! I think that is what we have implemented in this pull request currently. @jcohenadad what are your thoughts?

jcohenadad · 2021-01-12T20:34:14Z

Okay cool! I think that is what we have implemented in this pull request currently. @jcohenadad what are your thoughts?

Could you please try running ivadomed's testing framework on this testing-data branch to see if it passes?

ahill187 · 2021-01-12T23:38:02Z

Could you please try running ivadomed's testing framework on this testing-data branch to see if it passes?

Yes, I've run it locally in Python 3.8 and it passes, so long as we remove the test_script.py file. That file is the one that was creating all these duplicated files in the first place, and it also runs CLI tests. The former functionality is made redundant by this pull request, and the latter functionality will be made redundant by the funtional tests PR. I've created a PR here to remove the file:
ivadomed/ivadomed#634

jcohenadad

Thank you for your contribution @ahill187 🙏
Please make sure to click "squash and merge" (to make sure we don't accumulate big files on the main branch)

Duplicate sub-unf01 data as sub-unf02 and sub-unf03

0155e84

ahill187 requested review from andreanne-lemay and lrouhier January 5, 2021 00:28

update df_ref to include 3 datasets

a5e059d

jcohenadad requested changes Jan 5, 2021

View reviewed changes

ahill187 mentioned this pull request Jan 5, 2021

Error with tests ivadomed/ivadomed#599

Closed

ahill187 added 2 commits January 5, 2021 14:19

Add gitignore file

98f3343

update sub-unf02 and sub-unf03 to be unique subjects instead of copies

a83212e

Commit actual files (not git-annex)

92ed86c

jcohenadad reviewed Jan 5, 2021

View reviewed changes

df_ref.csv Show resolved Hide resolved

jcohenadad requested changes Jan 5, 2021

View reviewed changes

resample sub-unf02 and sub-unf03 to 0.1x0.1x0.1 using sct_resample

e4063c3

ahill187 added 2 commits January 7, 2021 13:58

resample sub-unf02 and sub-unf03 derivative files to 0.1x0.1x0.1

2285846

update df_ref file

9c8a907

ahill187 requested a review from jcohenadad January 12, 2021 18:41

ahill187 mentioned this pull request Jan 12, 2021

Remove test_script and ivado_functional_test files ivadomed/ivadomed#634

Merged

jcohenadad approved these changes Jan 13, 2021

View reviewed changes

ahill187 merged commit 8a29a0d into master Jan 13, 2021

ahill187 deleted the ah/update-data branch January 13, 2021 16:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add sub-unf02 and sub-unf03 datasets #7

Add sub-unf02 and sub-unf03 datasets #7

ahill187 commented Jan 5, 2021 •

edited

Loading

jcohenadad left a comment

ahill187 commented Jan 5, 2021

jcohenadad commented Jan 5, 2021

ahill187 commented Jan 5, 2021

jcohenadad commented Jan 5, 2021

ahill187 commented Jan 5, 2021

jcohenadad left a comment

jcohenadad commented Jan 7, 2021

ahill187 commented Jan 8, 2021

jcohenadad commented Jan 8, 2021

ahill187 commented Jan 8, 2021

andreanne-lemay commented Jan 10, 2021

ahill187 commented Jan 12, 2021

jcohenadad commented Jan 12, 2021

ahill187 commented Jan 12, 2021

jcohenadad left a comment

Add sub-unf02 and sub-unf03 datasets #7

Add sub-unf02 and sub-unf03 datasets #7

Conversation

ahill187 commented Jan 5, 2021 • edited Loading

jcohenadad left a comment

Choose a reason for hiding this comment

ahill187 commented Jan 5, 2021

jcohenadad commented Jan 5, 2021

ahill187 commented Jan 5, 2021

jcohenadad commented Jan 5, 2021

ahill187 commented Jan 5, 2021

jcohenadad left a comment

Choose a reason for hiding this comment

jcohenadad commented Jan 7, 2021

ahill187 commented Jan 8, 2021

jcohenadad commented Jan 8, 2021

ahill187 commented Jan 8, 2021

andreanne-lemay commented Jan 10, 2021

ahill187 commented Jan 12, 2021

jcohenadad commented Jan 12, 2021

ahill187 commented Jan 12, 2021

jcohenadad left a comment

Choose a reason for hiding this comment

ahill187 commented Jan 5, 2021 •

edited

Loading