Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix filenames in GridFS #11

Open
flavioamieiro opened this issue Apr 29, 2013 · 1 comment
Open

Fix filenames in GridFS #11

flavioamieiro opened this issue Apr 29, 2013 · 1 comment
Labels

Comments

@flavioamieiro
Copy link
Member

Some titles are unique in our original dataset but are not when we save them in GridFS. This happens because what is valid as a filename in GridFS is a subset of what is valid as the title of an article in Wikipedia. There is an issue to fix this for new uploads (NAMD/pypln.web#89) but we should create a script to fix current filenames and avoid uploading everything again.

@turicas
Copy link

turicas commented May 3, 2013

I think with this issue we should check and delete duplicated filenames, since the script that will fix the problem will iterate over the set of duplicated filenames.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants