v0.19.0: Inference Endpoints and robustness!
(Discuss about the release in our Community Tab. Feedback welcome!! 🤗)
🚀 Inference Endpoints API
Inference Endpoints provides a secure solution to easily deploy models hosted on the Hub in a production-ready infrastructure managed by Huggingface. With huggingface_hub>=0.19.0
integration, you can now manage your Inference Endpoints programmatically. Combined with the InferenceClient
, this becomes the go-to solution to deploy models and run jobs in production, either sequentially or in batch!
Here is an example how to get an inference endpoint, wake it up, wait for initialization, run jobs in batch and pause back the endpoint. All of this in a few lines of code! For more details, please check out our dedicated guide.
>>> import asyncio
>>> from huggingface_hub import get_inference_endpoint
# Get endpoint + wait until initialized
>>> endpoint = get_inference_endpoint("batch-endpoint").resume().wait()
# Run inference
>>> async_client = endpoint.async_client
>>> results = asyncio.gather(*[async_client.text_generation(...) for job in jobs])
# Pause endpoint
>>> endpoint.pause()
- Implement API for Inference Endpoints by @Wauplin in #1779
- Fix inference endpoints docs by @Wauplin in #1785
⏬ Improved download experience
huggingface_hub
is a library primarily used to transfer (huge!) files with the Huggingface Hub. Our goal is to keep improving the experience for this core part of the library. In this release, we introduce a more robust download mechanism for slow/limited connection while improving the UX for users with a high bandwidth available!
More robust downloads
Getting a connection error in the middle of a download is frustrating. That's why we've implemented a retry mechanism that automatically reconnects if a connection get closed or a ReadTimeout error is raised. The download restart exactly where it stopped without having to redownload any bytes.
- Retry on ConnectionError/ReadTimeout when streaming file from server by @Wauplin in #1766
- Reset nb_retries if data has been received from the server by @Wauplin in #1784
In addition to this, it is possible to configure huggingface_hub
with higher timeouts thanks to @Shahafgo. This should help getting around some issues on slower connections.
- Adding the ability to configure the timeout of get request by @Shahafgo in #1720
- Fix a bug to respect the HF_HUB_ETAG_TIMEOUT. by @Shahafgo in #1728
Progress bars while using hf_transfer
hf_transfer
is a Rust-based library focused on improving upload and download speed on machines with a high bandwidth available. Once installed (pip install -U hf_transfer
), it can transparently be used with huggingface_hub
simply by setting HF_HUB_ENABLE_HF_TRANSFER=1
as environment variable. The counterpart of higher performances is the lack of some user-friendly features such as better error handling or a retry mechanism -meaning it is recommended only to power-users-. In this release we still ship a new feature to improve UX: progress bars. No need to update any existing code, a simple library upgrade is enough.
hf-transfer
progress bar by @cbensimon in #1792- Add support for progress bars in hf_transfer uploads by @Wauplin in #1804
📚 Documentation
huggingface-cli
guide
huggingface-cli
is the CLI tool shipped with huggingface_hub
. It recently got some nice improvement, especially with commands to download and upload files directly from the terminal. All of this needed a guide, so here it is!
Environment variables
Environment variables are useful to configure how huggingface_hub
should work. Historically we had some inconsistencies on how those variables were named. This is now improved, with a backward compatible approach. Please check the package reference for more details. The goal is to propagate those changes to the whole HF-ecosystem, making configuration easier for everyone.
- Harmonize environment variables by @Wauplin in #1786
- Ensure backward compatibility for HUGGING_FACE_HUB_TOKEN env variable by @Wauplin in #1795
- Do not promote
HF_ENDPOINT
environment variable by @Wauplin in #1799
Hindi translation
Hindi documentation landed on the Hub thanks to @aneeshd27! Checkout the Hindi version of the quickstart guide here.
- Added translation of 3 files as mentioned in issue by @aneeshd27 in #1772
Minor docs fixes
- Added
[[autodoc]]
forModelStatus
by @jamesbraza in #1758 - Expanded docstrings on
post
andModelStatus
by @jamesbraza in #1740 - Fix document link for manage-cache by @liuxueyang in #1774
- Minor doc fixes by @pcuenca in #1775
💔 Breaking changes
Legacy ModelSearchArguments
and DatasetSearchArguments
have been completely removed from huggingface_hub
. This shouldn't cause problem as they were already not in use (and unusable in practice).
- Removed GeneralTags, ModelTags and DatasetTags by @VictorHugoPilled in #1761
Classes containing details about a repo (ModelInfo
, DatasetInfo
and SpaceInfo
) have been refactored by @mariosasko to be more Pythonic and aligned with the other classes in huggingface_hub
. In particular those objects are now based the dataclass
module instead of a custom ReprMixin
class. Every change is meant to be backward compatible, meaning no breaking changes is expected. However, if you detect any inconsistency, please let us know and we will fix it asap.
- Replace
ReprMixin
with dataclasses by @mariosasko in #1788 - Fix SpaceInfo initialization + add test by @Wauplin in #1802
The legacy Repository
and InferenceAPI
classes are now deprecated but will not be removed before the next major release (v1.0
).
Instead of the git-based Repository
, we advice to use the http-based HfApi
. Check out this guide explaining the reasons behind it. For InferenceAPI
, we recommend to switch to InferenceClient
which is much more feature-complete and will keep getting improved.
⚙️ Miscellaneous improvements, fixes and maintenance
InferenceClient
- Adding
InferenceClient.get_recommended_model
by @jamesbraza in #1770 - Fix InferenceClient.text_generation when pydantic is not installed by @Wauplin in #1793
- Supporting
pydantic<3
by @jamesbraza in #1727
HfFileSystem
- [hffs] Raise
NotImplementedError
on transaction commits by @Wauplin in #1736 - Fix huggingface filesystem repo_type not forwarded by @Wauplin in #1791
- Fix
HfFileSystemFile
when init fails + improve error message by @Wauplin in #1805
FIPS compliance
Misc fixes
- Fix UnboundLocalError when using commit context manager by @hahunavth in #1722
- Fixed improperly configured 'every' leading to test_sync_and_squash_history failure by @jamesbraza in #1731
- Testing
WEBHOOK_PAYLOAD_EXAMPLE
deserialization by @jamesbraza in #1732 - Keep lock files in a
/locks
folder to prevent rare concurrency issue by @beeender in #1659 - Fix Space runtime on static Space by @Wauplin in #1754
- Clearer error message on unprocessable entity. by @Wauplin in #1755
- Do not warn in ModelHubMixin on missing config file by @Wauplin in #1776
- Update SpaceHardware enum by @Wauplin in #1798
- change prop name by @julien-c in #1803
Internal
- Bump version to 0.19 by @Wauplin in #1723
- Make
@retry_endpoint
a default for all test by @Wauplin in #1725 - Retry test on 502 Bad Gateway by @Wauplin in #1737
- Consolidated mypy type ignores in
InferenceClient.post
by @jamesbraza in #1742 - fix: remove useless token by @rtrompier in #1765
- Fix CI (typing-extensions minimal requirement by @Wauplin in #1781
- remove black formatter to use only ruff by @Wauplin in #1783
- Separate test and prod cache (+ ruff formatter) by @Wauplin in #1789
- fix 3.8 tensorflow in ci by @Wauplin (direct commit on main)
🤗 Significant community contributions
The following contributors have made significant changes to the library over the last release:
- @VictorHugoPilled
- Removed GeneralTags, ModelTags and DatasetTags (#1761)
- @aneeshd27
- Added translation of 3 files as mentioned in issue (#1772)