You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I noticed that you trained the NLP emulator with the first 30 chunks of Pile dataset. I wonder how large are the 30 chunks? Or in other words, how many chunks does Pile have? The original Pile dataset is over 800G, it is too big for the labs...
Besides, did you try to use smaller datasets, such as Wikitext? What is the performance of using these smaller datasets?
Thanks
The text was updated successfully, but these errors were encountered:
May I ask if you were able to train on a smaller dataset for emulator distillation? If so, how was the method's performance in the case when distilled on smaller datasets? Any insights will be helpful for understanding the proposed algorithm better.
Hi,
I noticed that you trained the NLP emulator with the first 30 chunks of Pile dataset. I wonder how large are the 30 chunks? Or in other words, how many chunks does Pile have? The original Pile dataset is over 800G, it is too big for the labs...
Besides, did you try to use smaller datasets, such as Wikitext? What is the performance of using these smaller datasets?
Thanks
The text was updated successfully, but these errors were encountered: