Releases · ruanchaves/hashformers

03 Jun 15:52

v2.0.0

144b8f8

Hashformers v2.0.0: Enhanced Compatibility with Broad Range of Transformer Models including Large Language Models (LLMs) Latest

Latest

The new hashformers v2.0.0 release marks a significant upgrade to our hashtag segmentation library. Earlier versions were compatible only with BERT and GPT-2 models. Now, hashformers v2.0.0 allows you to segment hashtags with nearly any transformer model, including Large Language Models (LLMs) like Dolly, GPT-J, and Alpaca-LoRA. This will enable us to achieve unprecedented state-of-the-art results that surpass our earlier benchmarks.

Key improvements in this release include:

Expanded Support: Accommodates various Seq2Seq models, Masked Language Models, and autoregressive Transformer models available on the Hugging Face Model Hub. This includes but is not limited to FLAN-T5, DeBERTa, XLNet.
Greater Language Coverage: Thanks to the increase in supported models, we can now segment hashtags in a broader range of languages. This includes improved support for large models pretrained in low-resource languages.
Enhanced Documentation: We have updated our documentation and wiki, providing in-depth explanations of each library feature. This will enable users to customize their usage of our library more effectively according to their specific needs.

Assets 2

22 May 03:45

ruanchaves

v1.3.0

8649376

v1.3.0: Deprecating Features and Enhancing the Documentation

In this new version, we're refining our library by narrowing our focus onto the essential components. Here are the key updates in this release:

Deprecation of Experimental Features: To sharpen our focus on the core functionality, we've decided to discontinue support for the experimental features including the Word Segmenter Cascades and the Unigram word segmenter. This change will not impact any functions illustrated in our Google Colab tutorial.
Enhanced Code Documentation: We've upgraded our documentation with comprehensive docstrings to ensure the clarity and readability of the code. Given our small codebase, we have removed external documentation from the repository.
Disclaimer Notice Addition: A disclaimer notice has been included in the Google Colab notebook due to the discontinued support for mxnet-cu110. For future releases, we plan to eliminate this dependency to facilitate easier library updates.

Assets 2

12 Feb 10:15

ruanchaves

v1.2.2

a49e144

v1.2.2: Word Segmenter Cascades, Unigram word segmenter

Features:

Introduces word segmenter cascades that allow us to chain rerankers ( ad infinitum ).
Replaces ekphrasis by an unigram segmenter based on wordfreq. It can run on all languages supported by the wordfreq library.

Breaking changes: