Releases: ruanchaves/hashformers
Hashformers v2.0.0: Enhanced Compatibility with Broad Range of Transformer Models including Large Language Models (LLMs)
The new hashformers v2.0.0 release marks a significant upgrade to our hashtag segmentation library. Earlier versions were compatible only with BERT and GPT-2 models. Now, hashformers v2.0.0 allows you to segment hashtags with nearly any transformer model, including Large Language Models (LLMs) like Dolly, GPT-J, and Alpaca-LoRA. This will enable us to achieve unprecedented state-of-the-art results that surpass our earlier benchmarks.
Key improvements in this release include:
-
Expanded Support: Accommodates various Seq2Seq models, Masked Language Models, and autoregressive Transformer models available on the Hugging Face Model Hub. This includes but is not limited to FLAN-T5, DeBERTa, XLNet.
-
Greater Language Coverage: Thanks to the increase in supported models, we can now segment hashtags in a broader range of languages. This includes improved support for large models pretrained in low-resource languages.
-
Enhanced Documentation: We have updated our documentation and wiki, providing in-depth explanations of each library feature. This will enable users to customize their usage of our library more effectively according to their specific needs.
v1.3.0: Deprecating Features and Enhancing the Documentation
In this new version, we're refining our library by narrowing our focus onto the essential components. Here are the key updates in this release:
-
Deprecation of Experimental Features: To sharpen our focus on the core functionality, we've decided to discontinue support for the experimental features including the Word Segmenter Cascades and the Unigram word segmenter. This change will not impact any functions illustrated in our Google Colab tutorial.
-
Enhanced Code Documentation: We've upgraded our documentation with comprehensive docstrings to ensure the clarity and readability of the code. Given our small codebase, we have removed external documentation from the repository.
-
Disclaimer Notice Addition: A disclaimer notice has been included in the Google Colab notebook due to the discontinued support for
mxnet-cu110
. For future releases, we plan to eliminate this dependency to facilitate easier library updates.
v1.2.2: Word Segmenter Cascades, Unigram word segmenter
Features:
- Introduces word segmenter cascades that allow us to chain rerankers ( ad infinitum ).
- Replaces ekphrasis by an unigram segmenter based on
wordfreq
. It can run on all languages supported by the wordfreq library.
Breaking changes:
WordSegmenter
has been renamed toTransformerWordSegmenter
.
v1.1.0: Bug fixes, unit tests, extra segmenters
Features added to prepare for integration with pysentimiento.
- GPT-2 batch size bug fix
- More word segmenters ( regex, ekphrasis )
- A tweet segmenter
- Unit tests
v1.0.0: Clean-up, tutorial, packaging
First release of the hashformers library.
- General clean-up of the codebase.
- Step-by-step tutorial for usage, evaluation, and speed optimization.
- PyPI package.