Skip to content

Releases: huggingface/huggingface_hub

v0.8.1: lazy loading, git-aware cache file layout, new create_commit

15 Jun 15:53
Compare
Choose a tag to compare

Git-aware cache file layout

v0.8.1 introduces a new way of caching files from the Hugging Face Hub, to two methods: snapshot_download and hf_hub_download.
The new approach is extensively documented in the Documenting files guide and we recommend checking it out to get a better understanding of how caching works.

New create_commit API

A new create_commit API allows users to upload and delete several files at once using HTTP-based methods. You can read more about it in this guide. The following convenience methods were also introduced:

  • upload_folder: Allows uploading a local directory to a repo.
  • delete_file allows deleting a single file from a repo.

upload_file now uses create_commit under the hood.

create_commit also allows creating pull requests with a create_pr=True flag.

None of the methods rely on Git locally.

Lazy loading

All modules will now be lazy-loaded. This should drastically reduce the time it takes to import huggingface_hub as it will no longer load all soft dependencies.

Improvements and bugfixes

v0.7.0: Repocard metadata

30 May 12:18
Compare
Choose a tag to compare

Repocard metadata

This PR adds a metadata_update function that allows the user to update the metadata in a repository on the hub. The function accepts a dict with metadata (following the same pattern as the YAML in the README) and behaves as follows for all top level fields except model-index.

Examples:

Starting from

existing_results = [{
    'dataset': {'name': 'IMDb', 'type': 'imdb'},
    'metrics': [{'name': 'Accuracy', 'type': 'accuracy', 'value': 0.995}],
     'task': {'name': 'Text Classification', 'type': 'text-classification'}
}]

1. Overwrite existing metric value in existing result

new_results = deepcopy(existing_results)
new_results[0]["metrics"][0]["value"] = 0.999
_update_metadata_model_index(existing_results, new_results, overwrite=True)
[{'dataset': {'name': 'IMDb', 'type': 'imdb'},
  'metrics': [{'name': 'Accuracy', 'type': 'accuracy', 'value': 0.999}],
  'task': {'name': 'Text Classification', 'type': 'text-classification'}}]

2. Add new metric to existing result

new_results = deepcopy(existing_results)
new_results[0]["metrics"][0]["name"] = "Recall"
new_results[0]["metrics"][0]["type"] = "recall"
[{'dataset': {'name': 'IMDb', 'type': 'imdb'},
  'metrics': [{'name': 'Accuracy', 'type': 'accuracy', 'value': 0.995},
              {'name': 'Recall', 'type': 'recall', 'value': 0.995}],
  'task': {'name': 'Text Classification', 'type': 'text-classification'}}]

3. Add new result

new_results = deepcopy(existing_results)
new_results[0]["dataset"] = {'name': 'IMDb-2', 'type': 'imdb_2'}
[{'dataset': {'name': 'IMDb', 'type': 'imdb'},
  'metrics': [{'name': 'Accuracy', 'type': 'accuracy', 'value': 0.995}],
  'task': {'name': 'Text Classification', 'type': 'text-classification'}},
 {'dataset': ({'name': 'IMDb-2', 'type': 'imdb_2'},),
  'metrics': [{'name': 'Accuracy', 'type': 'accuracy', 'value': 0.995}],
  'task': {'name': 'Text Classification', 'type': 'text-classification'}}]

Improvements and bug fixes

v0.6.0: fastai support, binary file support, skip LFS files when pushing to the hub

09 May 20:11
Compare
Choose a tag to compare

Disclaimer: This release was initially released with advertised support for #844. It was not released in this release and will be in v0.7.

fastai support

v0.6.0 introduces downstream (download) and upstream (upload) support for the fastai libraries. It supports fastai versions above 2.4.
The integration is detailed in the following blog.

  • Add fastai upstream and downstream capacities for fastai>=2.4 and fastcore>=1.3.27 versions by @omarespejel in #678

Automatic binary file tracking in Repository

Binary files are now rejected by default by the Hub. v0.6.0 introduces automatic binary file tracking through the auto_lfs_track argument of the Repository.git_add method. It also introduces the Repository.auto_track_binary_files method which can be used independently of other methods.

skip_lfs_file is now added to mixins

The parameter skip_lfs_files is now added to the different mixins. This will enable pushing files to the hub without first downloading the files above 10MB. This should drammatically reduce the time needed when updating a modelcard, a configuration file, and others.

  • ✨ add skip_lfs_files to mixins' push_to_hub by @nateraw in #858

Keras support improvement

The support for Keras model is greatly improved through several additions:

  • The save_pretrained_keras method now accepts a list of tags that will automatically be added to the repository.
  • Download statistics are now available on Keras models

Bugfixes and improvements

v0.5.1: Patch release

07 Apr 19:10
Compare
Choose a tag to compare

This is a patch release fixing a breaking backward compatibility issue.

Linked PR: #822

v0.5.0: Reference documentation, Keras improvements, stabilizing the API

07 Apr 19:09
Compare
Choose a tag to compare

Documentation

Version v0.5.0 is the first version which features an API reference. It is still a work in progress with features lacking, some images not rendering, and a documentation reorg coming up, but should already provide significantly simpler access to the huggingface_hub API.

The documentation is visible here.

Model & datasets list improvements

The list_models and list_datasets methods have been improved in several ways.

List private models

These two methods now accept the token keyword to specify your token. Specifying the token will include your private models and datasets in the returned list.

  • Support list_models and list_datasets with token arg by @muellerzr in #638

Modelcard metadata

These two methods now accept the cardData boolean argument. If set to True, the modelcard metadata will also be returned when using these two methods.

  • Include cardData in list_models and list_datasets by @muellerzr in #639

Filtering by carbon emissions

The list_models method now also accepts an emissions_trehsholds parameter to filter by carbon emissions.

Keras improvements

The Keras serialization and upload methods have been worked on to provide better support for models:

  • All parameters are now included in the saved model when using push_to_hub_keras
  • log_dir parameter for TensorBoard logs, which will automatically spawn a TensorBoard instance on the Hub.
  • Automatic model card

Contributing guide

A contributing guide is now available for the huggingface_hub repository. For any and all information related to contributing to the repository, please check it out!

Read more about it here: CONTRIBUTING.md.

Pre-commit hooks

The huggingface_hub GitHub repository has several checks to ensure that the code respects code quality standards. Opt-in pre-commit hooks have been added in order to make it simpler for contributors to leverage them.

Read more about it in the aforementionned CONTRIBUTING guide.

Renaming and transferring repositories

Repositories can now be renamed and transferred programmatically using move_repo.

  • Allow renaming and transferring repos programmatically by @osanseviero in #704

Breaking changes & deprecation

⛔ The following methods have now been removed following a deprecation cycle

list_repos_objs

The list_repos_objs and the accompanying CLI utility huggingface-cli repo ls-files have been removed.
The same can be done using the model_info and dataset_info methods.

  • Remove deprecated list_repos_objs and huggingface-cli repo ls-files by @julien-c in #702

Python 3.6

Python 3.6 support is now dropped as end of life. Using Python 3.6 and installing huggingface_hub will result in version v0.4.0 being installed.

⚠️ Items below are now deprecated and will be removed in a future version

  • API deprecate positional args in file_download and hf_api by @adrinjalali in #745
  • MNT deprecate name and organization in favor of repo_id by @adrinjalali in #733

What's Changed

New Contributors

Full Changelog: v0.4.0...v0.5.0

v0.4.0: Tag listing, Namespace Objects, Model Filter

26 Jan 18:30
Compare
Choose a tag to compare

Tag listing

This PR introduces the ability to fetch all available tags for models or datasets and returns them as a nested namespace object, for example:

>>> from huggingface_hub import HfApi

>>> api = HfApi() 
>>> tags = api.get_model_tags()
>>> print(tags)
Available Attributes:
 * benchmark
 * language_creators
 * languages
 * licenses
 * multilinguality
 * size_categories
 * task_categories
 * task_ids

>>> print(tags.benchmark)
Available Attributes:
 * raft
 * superb
 * test

Namespace objects

With a goal of adding more tab-completion to the library, this PR introduces two objects:

  • DatasetSearchArguments
  • ModelSearchArguments

These two AttributeDictionary objects contain all the valid information we can extract from a model as tab-complete parameters. We also include the author_or_organization and dataset (or model) _name as well through careful string splitting.

Model Filter

This PR introduces a new way to search the hub: the ModelFilter class.

It is a simple Enum at first to the user, allowing them to specify what they want to search for, such as:

f = ModelFilter(author="microsoft", model_name="wavlm-base-sd", framework="pytorch")

From there, they can pass in this filter to the new list_models_by_filter function in HfApi to search through it:

models = api.list_modes(filter=f)

The API may then be used for complex queries:

args = ModelSearchArguments()
f = ModelFilter(framework=[args.library.pytorch, args.library.TensorFlow], model_name="bert", tasks=[args.pipeline_tag.Summarization, args.pipeline_tag.TokenClassification])

api.list_models_from_filter(f)

Ignoring filenames in snapshot_download

This PR introduces a way to limit the files that will be fetched by the snapshot_download. This is useful when you want to download and cache an entire repository without using git, and that you want to skip files according to their filenames.

What's Changed

New Contributors

Full Changelog: v0.2.1...v0.4.0

v0.2.1: Patch release

26 Jan 18:18
Compare
Choose a tag to compare

This is a patch release fixing an issue with the notebook login.

5e2da9b#diff-fb1696cbcf008dd89dde5e8c1da9d4be5a8f7d809bc32f07d4453caba40df15f

v0.2.0: Access tokens, skip large files, local files only

26 Jan 18:17
Compare
Choose a tag to compare

Access tokens

Version v0.2.0 introduces the access token compatibility with the hub. It offers the access tokens as the main login handler, with the possibility to still login with username/password when doing [Ctrl/CMD]+C on the login prompt:

image

The notebook login is adapted to work with the access tokens.

Skipping large files

The Repository class now has an additional parameter, skip_lfs_files, which allows cloning the repository while skipping the large file download.

#472

Local files only for snapshot_download

The snapshot_download method can now take local_files_only as a parameter to enable leveraging previously downloaded files.

#505

v0.1.2: Patch release

09 Nov 17:46
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.1.1...v0.1.2

v0.1.1: Patch release

05 Nov 18:39
Compare
Choose a tag to compare

What's Changed

  • Fix typing-extensions minimum version by @lhoestq in #453
  • Fix argument order in create_repo for Repository.clone_from by @sgugger in #459

Full Changelog: v0.1.0...v0.1.1