hf-trim

Why Trim Model Vocabulary?

Trimming the embeddings for large language models can be very helpful where a large part of their vocabulary may be unused and consuming unnecessary memory during training or inference. By eliminating these unused tokens, we can free up valuable space which can then be utilized for larger batch sizes or to load models which would otherwise be too large to fit in the memory under given constraints. One of the best examples of this are multilingual models, which are often used for only a subset of the languages they have been pretrained on.

Pruning generally comes at a slight cost of performance. I have not extensively tested this myself, however a drop in BLEU score of under 1 point has been reported. I plan to do some additional testing and will add those results here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WHY.md

WHY.md

hf-trim

Why Trim Model Vocabulary?

Files

WHY.md

Latest commit

History

WHY.md

File metadata and controls

hf-trim

Why Trim Model Vocabulary?