-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Could we add OpenHermes 2.5 dataset? #2
Comments
It seems the dataset is 404'ing on huggingface. Looking at the original openhermes, however, it includes:
I've already included airoboros and code alpaca, but I can look into the others. Is there a particular functionality you are seeing lacking in the model, or just want broader coverage of datasets in general? |
Thank you for sharing @jondurbin I would like to build a better Mistral Instruct 0.2 model from the mistral base, and i'm looking for high quality datasets with good coverage. With regards to the previous question, I think having datasets with broad coverage is important. I'm also looking for good synthetic datasets. |
I don't have the resources to deeply evaluate all of the items within each dataset, so I somewhat rely on the dataset creators/curators to know what they are doing, plus a bit of intuition on my part. In airoboros I have a There are other tools as well, like distilabel which are handy for annotation as well. The DPO datasets, however, I try to only use the highest quality items, which are either human annotated or GPT-4 annotated, and I tend to filter down to a subset of those. Having a bit of noise in the SFT phase isn't too much of a problem, but can cause havoc in the DPO phase. |
Thanks!
The text was updated successfully, but these errors were encountered: