-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Big Dataset Examples #163
base: master
Are you sure you want to change the base?
Big Dataset Examples #163
Conversation
I actually failed to understand how HF can allow the use of a custom download of the dataset the pile yet but I plan to add another example with that dataset |
Wouldn't we want to extract the archives into SLURM_TMPDIR in the |
Yes I was also thinking about that and the currently strategy in |
ab3e057
to
f861529
Compare
dd09537
to
41094f9
Compare
Waiting for merge of #161. |
94be372
to
1572fd8
Compare
docs/Minimal_examples.rst
Outdated
@@ -5,4 +5,5 @@ | |||
|
|||
.. include:: examples/frameworks/index.rst | |||
.. include:: examples/distributed/index.rst | |||
.. include:: examples/data/index.rst |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might fit nicely in good_practices
, what do you think?
|
||
**job.sh** | ||
|
||
.. literalinclude:: examples/data/torchvision/job.sh.diff |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.. literalinclude:: examples/data/torchvision/job.sh.diff | |
.. literalinclude:: job.sh.diff |
|
||
**main.py** | ||
|
||
.. literalinclude:: examples/data/torchvision/main.py.diff |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.. literalinclude:: examples/data/torchvision/main.py.diff | |
.. literalinclude:: main.py.diff |
|
||
**data.py** | ||
|
||
.. literalinclude:: examples/data/torchvision/data.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.. literalinclude:: examples/data/torchvision/data.py | |
.. literalinclude:: data.py |
3792d9d
to
c159806
Compare
@lebrice did you had time to check the recent updates to this PR? |
Not fully, but a glance, my comment here doesnt seem to have been addressed: #163 (comment) Edit: Okay I've looked at it now, my previous comments about the content are still relevant (for the most part). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, same comment (third time I make it): #163 (comment)
Let me know what you think.
Co-authored-by: Fabrice Normandin <[email protected]>
Co-authored-by: Fabrice Normandin <[email protected]>
Co-authored-by: Fabrice Normandin <[email protected]>
Co-authored-by: Fabrice Normandin <[email protected]>
5744f32
to
e40566a
Compare
So I think the only issues remaining were the |
e40566a
to
2e81ced
Compare
2e81ced
to
c71bfc7
Compare
Let me clarify the comment #163 (comment) : What I'm saying is that I don't really see the value in having the main.py file included in this example, or showing a diff with respect to the single-gpu job's main.py (you did address this part by removing the diff, thanks!). In my opinion, the main "body" of the example is data.py, and showing how to use What do you think? |
To be clear, if you feel like you want to merge this, then sure, it's fine as-is. I was just hoping that perhaps we could re-focus the example a bit so it doesn't dilute or mix up the important part of the content with what's already in the GPU job example. One other thing: Why do we allow customizing the number of workers for data preparation? Is there a context in which we don't want to use the number of data preparation workers = number of cpus per node? |
Nah not on the cluster, people will use all CPUs available, this is mostly a left over from the scripts I'm personally using to preprocess datasets (at least the bash version). We're also showing a very good practice which is to not override environnement variables if they exists but I'm ok with removing it. I agree for the |
No description provided.