Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HuggingFaceDatasets.jl #2

Open
CarloLucibello opened this issue May 15, 2022 · 2 comments
Open

HuggingFaceDatasets.jl #2

CarloLucibello opened this issue May 15, 2022 · 2 comments

Comments

@CarloLucibello
Copy link
Member

CarloLucibello commented May 15, 2022

Hi @chengchingwen,
just wanted to let you know that I just started https://github.com/CarloLucibello/HuggingFaceDatasets.jl. For the time being it depends on the datasets python package through PythonCall.jl, but if there will be enough interest in the future it could become a julia-only package and rely on HuggingFaceApi.jl.

Best,
Carlo

@chengchingwen
Copy link
Member

@CarloLucibello This is great! but I would doubt if it's possible to become a julia-only package.

I took investigation long ago when making HuggingFaceApi.jl, and realized that the way they handle datasets are quite different from the model/config parts. The datasets part are more code-dependent. They didn't store the Arrow format file on their hub, but instead they have both raw dataset file AND A PYTHON CODE on the hub. The arrow format dataset is generated right after it download those file from hub.

So there is a problem about what are people's expectation when using the huggingface datasets. For now, we could use HuggingFaceApi.jl to download (and cache) the raw dataset file (and the python code, but this is useless for us at this moment). However, if people use huggingface for their data processing (the arrow file), then the only way we can get that is by calling a python interpreter.

@zsz00
Copy link

zsz00 commented Jun 8, 2022

I find a new HuggingFaceApi repo:
https://github.com/cjdoris/HuggingFaceHub.jl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants