You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @chengchingwen,
just wanted to let you know that I just started https://github.com/CarloLucibello/HuggingFaceDatasets.jl. For the time being it depends on the datasets python package through PythonCall.jl, but if there will be enough interest in the future it could become a julia-only package and rely on HuggingFaceApi.jl.
Best,
Carlo
The text was updated successfully, but these errors were encountered:
@CarloLucibello This is great! but I would doubt if it's possible to become a julia-only package.
I took investigation long ago when making HuggingFaceApi.jl, and realized that the way they handle datasets are quite different from the model/config parts. The datasets part are more code-dependent. They didn't store the Arrow format file on their hub, but instead they have both raw dataset file AND A PYTHON CODE on the hub. The arrow format dataset is generated right after it download those file from hub.
So there is a problem about what are people's expectation when using the huggingface datasets. For now, we could use HuggingFaceApi.jl to download (and cache) the raw dataset file (and the python code, but this is useless for us at this moment). However, if people use huggingface for their data processing (the arrow file), then the only way we can get that is by calling a python interpreter.
Hi @chengchingwen,
just wanted to let you know that I just started https://github.com/CarloLucibello/HuggingFaceDatasets.jl. For the time being it depends on the
datasets
python package through PythonCall.jl, but if there will be enough interest in the future it could become a julia-only package and rely on HuggingFaceApi.jl.Best,
Carlo
The text was updated successfully, but these errors were encountered: