Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[datasets] fetching fail if corrupted file #10

Open
AlexandreAbraham opened this issue Jun 22, 2012 · 13 comments
Open

[datasets] fetching fail if corrupted file #10

AlexandreAbraham opened this issue Jun 22, 2012 · 13 comments
Labels

Comments

@AlexandreAbraham
Copy link
Owner

If a file has not been downloaded properly, fetching fails instead of trying to download it again.

@AlexandreAbraham
Copy link
Owner Author

Should we put md5 hashtags in the code to ensure integrity ?

@AlexandreAbraham
Copy link
Owner Author

@Gael for your problem with NYU Test Retest dataset, my guess is that you have a corrupted zip file that was downloaded with a previous version of the downloader (now, if the download fails, it cleans everything).
Three solutions :

  • you delete the file by hand
  • you put the parameter "force_download=True"
  • I can change the downloader to make it erase systematically any previous downloaded file if it cannot load the dataset.

@GaelVaroquaux
Copy link
Collaborator

Fourth solution: you change the downloader so that in such a situation it cleans up by it self. It should never fail and catch exceptions to fall back on its feets.

@GaelVaroquaux
Copy link
Collaborator

I cannot afford to wait any longer. I am on this issue and will fix it.

@AlexandreAbraham
Copy link
Owner Author

I have made a patch but it is not fully tested yet, I am testing it actually. I will push it.

@GaelVaroquaux
Copy link
Collaborator

I have made a patch but it is not fully tested yet, I am testing it actually. I will push it.

OK, so you are saying that I just lost an hour for nothing :$.

When do you expect to have the patch ready? I have 24 hours to get the
tutorial going. I am starting to be really nervous.

@AlexandreAbraham
Copy link
Owner Author

I have pushed it. For basic behavior (download from scratch, download with existing corrupted files) it works. I hope that it will work on your computer this time. I am testing it under windows right now...

@GaelVaroquaux
Copy link
Collaborator

I have pushed it. For basic behavior (download from scratch, download with existing corrupted files) it works. I hope that it will work on your computer this time. I am testing it under windows right now...

I have pushed a fix that works for me (I tested it manually) on a
nyu_trt_fix branch on my github. I am too tired to do anything more
tonight.

G

@AlexandreAbraham
Copy link
Owner Author

I think I'll make a phony dataset for testing purpose because I also find it hard to test with real dataset.

@GaelVaroquaux
Copy link
Collaborator

I have pushed it. For basic behavior (download from scratch, download with existing corrupted files) it works.

I don't understand: where is the branch in which the nyu_trt download is
supposed to work? From what I can see, you pushed the fix in your master,
in which the nyu_trt downloader does not exist (that's good, don't change
that). I tried creating a temporary branch to merge the changes that you
had done in your master into you nyu_trt branch, but I get non
functionning code.

Which code should I use to have a working set of examples with the NYU
dataset?

FYI, I believe that the problem with the nyu_trt is the URL used for
downloading: it tries the following:

In [2]: nyu = datasets.fetch_nyu_rest()
Downloading data from http://www.nitrc.org/frs/download.php/1071NYU_TRT_session1a.tar.gz ...

The URL above does not exist (try it). However, the following URL exists:
http://www.nitrc.org/frs/download.php/1071/NYU_TRT_session1a.tar.gz ...

I pushed a fix for this problem in my nyu_trt_fix branch, and it does
work. By the way, if you are going to use this fix, please do merge my
branch, so that we have somewhat meaningful history.

@AlexandreAbraham
Copy link
Owner Author

Ok, I did not understand that there also was a problem with the URL. I'll merge your branch.

@GaelVaroquaux
Copy link
Collaborator

AFAICT, I definitively fixed this issue in origin/master. I am closing it.

@GaelVaroquaux
Copy link
Collaborator

OK, I don't have permission to close it. @AlexandreAbraham you should do it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants