Releases: huggingface/huggingface_hub
v0.8.1: lazy loading, git-aware cache file layout, new create_commit
Git-aware cache file layout
v0.8.1 introduces a new way of caching files from the Hugging Face Hub, to two methods: snapshot_download
and hf_hub_download
.
The new approach is extensively documented in the Documenting files guide and we recommend checking it out to get a better understanding of how caching works.
New create_commit
API
A new create_commit
API allows users to upload and delete several files at once using HTTP-based methods. You can read more about it in this guide. The following convenience methods were also introduced:
upload_folder
: Allows uploading a local directory to a repo.delete_file
allows deleting a single file from a repo.
upload_file
now uses create_commit
under the hood.
create_commit
also allows creating pull requests with a create_pr=True
flag.
None of the methods rely on Git locally.
- New
create_commit
API by @SBrandeis in #888
Lazy loading
All modules will now be lazy-loaded. This should drastically reduce the time it takes to import huggingface_hub
as it will no longer load all soft dependencies.
- ENH lazy load modules in the root init by @adrinjalali in #874
Improvements and bugfixes
- Add request ID to all requests by @LysandreJik in #909
- Remove deprecations by @LysandreJik in #910
- FIX Avoid creating repository when it exists on remote by @merveenoyan in #900
- 🏗 Use
hub-ci
for tests by @SBrandeis in #898 - Refine 404 errors by @LysandreJik in #878
- Fix typo by @lsb in #902
- FIX
metadata_update
: work on a copy of the upstream file, to not mess up the cache by @julien-c in #891 - ENH Removed history writing in Keras model card by @merveenoyan in #876
- CI enable codecov by @adrinjalali in #893
- MNT deprecate imports from snapshot_download by @adrinjalali in #880
- Pushback deprecation for v0.7 release by @LysandreJik in #882
- FIX make import machinary private by @adrinjalali in #879
- ENH Keras Use table instead of dictionary for hyperparameters in model card by @merveenoyan in #877
- Invert deprecation for create_repo in #912
- Constant was accidentally removed during deprecation transition in #913
v0.7.0: Repocard metadata
Repocard metadata
This PR adds a metadata_update function that allows the user to update the metadata in a repository on the hub. The function accepts a dict with metadata (following the same pattern as the YAML in the README) and behaves as follows for all top level fields except model-index.
Examples:
Starting from
existing_results = [{
'dataset': {'name': 'IMDb', 'type': 'imdb'},
'metrics': [{'name': 'Accuracy', 'type': 'accuracy', 'value': 0.995}],
'task': {'name': 'Text Classification', 'type': 'text-classification'}
}]
1. Overwrite existing metric value in existing result
new_results = deepcopy(existing_results)
new_results[0]["metrics"][0]["value"] = 0.999
_update_metadata_model_index(existing_results, new_results, overwrite=True)
[{'dataset': {'name': 'IMDb', 'type': 'imdb'},
'metrics': [{'name': 'Accuracy', 'type': 'accuracy', 'value': 0.999}],
'task': {'name': 'Text Classification', 'type': 'text-classification'}}]
2. Add new metric to existing result
new_results = deepcopy(existing_results)
new_results[0]["metrics"][0]["name"] = "Recall"
new_results[0]["metrics"][0]["type"] = "recall"
[{'dataset': {'name': 'IMDb', 'type': 'imdb'},
'metrics': [{'name': 'Accuracy', 'type': 'accuracy', 'value': 0.995},
{'name': 'Recall', 'type': 'recall', 'value': 0.995}],
'task': {'name': 'Text Classification', 'type': 'text-classification'}}]
3. Add new result
new_results = deepcopy(existing_results)
new_results[0]["dataset"] = {'name': 'IMDb-2', 'type': 'imdb_2'}
[{'dataset': {'name': 'IMDb', 'type': 'imdb'},
'metrics': [{'name': 'Accuracy', 'type': 'accuracy', 'value': 0.995}],
'task': {'name': 'Text Classification', 'type': 'text-classification'}},
{'dataset': ({'name': 'IMDb-2', 'type': 'imdb_2'},),
'metrics': [{'name': 'Accuracy', 'type': 'accuracy', 'value': 0.995}],
'task': {'name': 'Text Classification', 'type': 'text-classification'}}]
Improvements and bug fixes
- Keras: Saving history in a JSON file by @merveenoyan in #861
- space after uri by @leondz in #866
v0.6.0: fastai support, binary file support, skip LFS files when pushing to the hub
Disclaimer: This release was initially released with advertised support for #844. It was not released in this release and will be in v0.7.
fastai support
v0.6.0 introduces downstream (download) and upstream (upload) support for the fastai libraries. It supports fastai versions above 2.4.
The integration is detailed in the following blog.
- Add fastai upstream and downstream capacities for fastai>=2.4 and fastcore>=1.3.27 versions by @omarespejel in #678
Automatic binary file tracking in Repository
Binary files are now rejected by default by the Hub. v0.6.0 introduces automatic binary file tracking through the auto_lfs_track
argument of the Repository.git_add
method. It also introduces the Repository.auto_track_binary_files
method which can be used independently of other methods.
- ENH Auto track binary files in Repository by @LysandreJik in #828
skip_lfs_file
is now added to mixins
The parameter skip_lfs_files
is now added to the different mixins. This will enable pushing files to the hub without first downloading the files above 10MB. This should drammatically reduce the time needed when updating a modelcard, a configuration file, and others.
Keras support improvement
The support for Keras model is greatly improved through several additions:
- The
save_pretrained_keras
method now accepts a list of tags that will automatically be added to the repository. - Download statistics are now available on Keras models
- Introducing list of tags to Keras model card by @merveenoyan in #806
- Enable keras download stats by @merveenoyan in #860
Bugfixes and improvements
- FIX don't raise if name/organizaiton are passed postionally by @adrinjalali in #822
- ENH Use provided token from HUGGING_FACE_HUB_TOKEN env variable if available by @FrancescoSaverioZuppichini in #794
- tests(hf_api): remove infectionTypes field by @McPatate in #834
- Remove docs, tasks and inference API from huggingface_hub by @osanseviero in #833
- FEAT Uniformize
hf_api
a bit and add support for Spaces by @julien-c in #792 - Add a bug report template by @osanseviero in #832
- clean up formatting by @stevhliu in #839
- Release guide by @LysandreJik in #820
- Fix keras test by @osanseviero in #855
- DOC Add quick start guide by @stevhliu in #850
- MNT refactor: subprocess.run -> run_subprocess by @LysandreJik in #352
- MNT enable preview on black by @adrinjalali in #849
- Update how to guides by @stevhliu in #840
- Update contribution guide for merging PRs by @stevhliu in #856
- DOC Update landing page by @stevhliu in #854
- space after uri by @leondz in #866
v0.5.1: Patch release
This is a patch release fixing a breaking backward compatibility issue.
Linked PR: #822
v0.5.0: Reference documentation, Keras improvements, stabilizing the API
Documentation
Version v0.5.0 is the first version which features an API reference. It is still a work in progress with features lacking, some images not rendering, and a documentation reorg coming up, but should already provide significantly simpler access to the huggingface_hub
API.
The documentation is visible here.
- API reference documentation by @LysandreJik in #782
- [API Reference docs] Remove git references from GitHub Action templates by @LysandreJik in #813
- DOC API docstring improvements by @adrinjalali in #731
Model & datasets list improvements
The list_models
and list_datasets
methods have been improved in several ways.
List private models
These two methods now accept the token
keyword to specify your token. Specifying the token will include your private models and datasets in the returned list.
- Support list_models and list_datasets with token arg by @muellerzr in #638
Modelcard metadata
These two methods now accept the cardData
boolean argument. If set to True
, the modelcard metadata will also be returned when using these two methods.
- Include cardData in list_models and list_datasets by @muellerzr in #639
Filtering by carbon emissions
The list_models
method now also accepts an emissions_trehsholds
parameter to filter by carbon emissions.
- Enable filtering by carbon emission by @muellerzr in #668
Keras improvements
The Keras serialization and upload methods have been worked on to provide better support for models:
- All parameters are now included in the saved model when using
push_to_hub_keras
log_dir
parameter for TensorBoard logs, which will automatically spawn a TensorBoard instance on the Hub.- Automatic model card
- Introduce
include_optimizer
parameter topush_to_hub_keras()
by @merveenoyan in #616 - Add TensorBoard for Keras models by @merveenoyan in #651
- Create Automatic Keras model card by @merveenoyan in #679
- Allow TensorBoard Override for same Repository by @merveenoyan in #709
- Add tempfile for tensorboard logs in tensorboard tests in
test_keras_integration.py
by @merveenoyan in #761
Contributing guide
A contributing guide is now available for the huggingface_hub
repository. For any and all information related to contributing to the repository, please check it out!
Read more about it here: CONTRIBUTING.md.
Pre-commit hooks
The huggingface_hub
GitHub repository has several checks to ensure that the code respects code quality standards. Opt-in pre-commit hooks have been added in order to make it simpler for contributors to leverage them.
Read more about it in the aforementionned CONTRIBUTING guide.
- MNT Add pre-commit hooks by @adrinjalali in #807
Renaming and transferring repositories
Repositories can now be renamed and transferred programmatically using move_repo
.
- Allow renaming and transferring repos programmatically by @osanseviero in #704
Breaking changes & deprecation
⛔ The following methods have now been removed following a deprecation cycle
list_repos_objs
The list_repos_objs
and the accompanying CLI utility huggingface-cli repo ls-files
have been removed.
The same can be done using the model_info
and dataset_info
methods.
Python 3.6
Python 3.6 support is now dropped as end of life. Using Python 3.6 and installing huggingface_hub
will result in version v0.4.0 being installed.
- CI support python 3.7-3.10 - remove 3.6 support by @adrinjalali in #790
- API deprecate positional args in file_download and hf_api by @adrinjalali in #745
- MNT deprecate name and organization in favor of repo_id by @adrinjalali in #733
What's Changed
- Include "model" in repo_type to keep consistency by @muellerzr in #620
- Hotfix for repo_type by @muellerzr in #623
- fix: typo in docstring by @ariG23498 in #647
- {upload|delete}_file: Remove client-side filename validation by @SBrandeis in #669
- Ensure
post_method
is only executed once by @sgugger in #676 - Remove paying subscription mention from docstring by @cakiki in #653
- Improve tests and logging by @muellerzr in #682
- docs(links): Update
settings/token
tosettings/tokens
by @ronvoluted in #699 - Add support for private hub by @juliensimon in #703
- Add retry_endpoint for test stability by @osanseviero in #719
- FIX fix a bug in _filter_emissions to accept numbers w/o decimal and dict emissions by @adrinjalali in #753
- Logging fix for
hf_api
, logging documentation by @LysandreJik in #748 - Contributing guide & code of conduct by @LysandreJik in #692
- Fix pytorch and tensorflow python matrix by @osanseviero in #760
- MNT add links to related projects and the forum on issue template by @adrinjalali in #773
- Note on the README by @LysandreJik in #772
- Remove autoreviewers by @muellerzr in #793
- CI Error on FutureWarning by @adrinjalali in #787
- MNT more informative message on error in
Hf.Api.delete_repo
by @adrinjalali in #783 - Add security status by @McPatate in #654
- Remove redundant part of security test by @osanseviero in #802
- Changed test repository names to fix tests by @merveenoyan in #803
- TST calling delete_repo under tempfile for fixing the test by @merveenoyan in #804
- Disable logging in with organization token by @merveenoyan in #780
- MNT change dev version to 0.5, 0.4 is already released by @adrinjalali in #810
- 👨💻 Configure HF Hub URL with environment variable by @SBrandeis in #815
- MNT support oder requests versions by @adrinjalali in #817
- Rename the env variable
HF_ENDPOINT
. by @Narsil in #819
New Contributors
- @McPatate made their first contribution in #583
- @FremyCompany made their first contribution in #606
- @simoninithomas made their first contribution in #633
- @mlonaws made their first contribution in #630
- @ariG23498 made their first contribution in #647
- @J-Petiot made their first contribution in #660
- @ronvoluted made their first contribution in #699
- @juliensimon made their first contribution in #703
- @allendorf made their first contribution in #742
- @frgfm made their first contribution in #747
- @hbredin made their first contribution in #688
Full Changelog: v0.4.0...v0.5.0
v0.4.0: Tag listing, Namespace Objects, Model Filter
Tag listing
- Introduce Tag Listing by @muellerzr in #537
This PR introduces the ability to fetch all available tags for models or datasets and returns them as a nested namespace object, for example:
>>> from huggingface_hub import HfApi
>>> api = HfApi()
>>> tags = api.get_model_tags()
>>> print(tags)
Available Attributes:
* benchmark
* language_creators
* languages
* licenses
* multilinguality
* size_categories
* task_categories
* task_ids
>>> print(tags.benchmark)
Available Attributes:
* raft
* superb
* test
Namespace objects
- Namespace Objects for Search Parameters by @muellerzr in #556
With a goal of adding more tab-completion to the library, this PR introduces two objects:
DatasetSearchArguments
ModelSearchArguments
These two AttributeDictionary
objects contain all the valid information we can extract from a model as tab-complete parameters. We also include the author_or_organization
and dataset
(or model
) _name
as well through careful string splitting.
Model Filter
- Implement a Model Filter class by @muellerzr in #553
This PR introduces a new way to search the hub: the ModelFilter
class.
It is a simple Enum at first to the user, allowing them to specify what they want to search for, such as:
f = ModelFilter(author="microsoft", model_name="wavlm-base-sd", framework="pytorch")
From there, they can pass in this filter to the new list_models_by_filter
function in HfApi
to search through it:
models = api.list_modes(filter=f)
The API may then be used for complex queries:
args = ModelSearchArguments()
f = ModelFilter(framework=[args.library.pytorch, args.library.TensorFlow], model_name="bert", tasks=[args.pipeline_tag.Summarization, args.pipeline_tag.TokenClassification])
api.list_models_from_filter(f)
Ignoring filenames in snapshot_download
This PR introduces a way to limit the files that will be fetched by the snapshot_download
. This is useful when you want to download and cache an entire repository without using git, and that you want to skip files according to their filenames.
- [Snapshot download] allow some filenames to be ignored by @patrickvonplaten in #566
What's Changed
- [Hotfix][API] card_data => cardData on /api/datasets by @julien-c in #530
- Fix the progress bars when cloning a repository by @LysandreJik in #517
- Update Hugging Face Hub documentation README and Endpoints by @muellerzr in #527
- Convert string functions to f-string by @muellerzr in #536
- Fixing FS for
espnet
. by @Narsil in #542 - [snapshot_download] upgrade to canonical separator by @julien-c in #545
- Add test directions by @muellerzr in #547
- [HOTFIX] Change test for missing_input to reflect back-end redirect changes by @muellerzr in #552
- Bring consistency to download and upload APIs by @muellerzr in #574
- Search by authors and string by @FrancescoSaverioZuppichini in #531
- Quick typo by @muellerzr in #575
New Contributors
- @kahne made their first contribution in #569
- @FrancescoSaverioZuppichini made their first contribution in #531
Full Changelog: v0.2.1...v0.4.0
v0.2.1: Patch release
This is a patch release fixing an issue with the notebook login.
5e2da9b#diff-fb1696cbcf008dd89dde5e8c1da9d4be5a8f7d809bc32f07d4453caba40df15f
v0.2.0: Access tokens, skip large files, local files only
Access tokens
Version v0.2.0 introduces the access token compatibility with the hub. It offers the access tokens as the main login handler, with the possibility to still login with username/password when doing [Ctrl/CMD]+C on the login prompt:
The notebook login is adapted to work with the access tokens.
Skipping large files
The Repository
class now has an additional parameter, skip_lfs_files
, which allows cloning the repository while skipping the large file download.
Local files only for snapshot_download
The snapshot_download
method can now take local_files_only
as a parameter to enable leveraging previously downloaded files.
v0.1.2: Patch release
What's Changed
- clean_ok should be True by default by @LysandreJik in #462
Full Changelog: v0.1.1...v0.1.2
v0.1.1: Patch release
What's Changed
- Fix typing-extensions minimum version by @lhoestq in #453
- Fix argument order in
create_repo
forRepository.clone_from
by @sgugger in #459
Full Changelog: v0.1.0...v0.1.1