Releases: VikParuchuri/surya
Releases · VikParuchuri/surya
Faster text detection + layout
Switched model architecture for the text detection and layout models:
- 30% faster on GPU
- 4x faster on CPU
- 12x faster on MPS (M series macs)
Accuracy should be about the same, or slightly better, from my benchmarks.
v0.4.14: Merge pull request #141 from VikParuchuri/dev
New transformers version added a new kwarg to donut embeddings. This now handles and ignores that kwarg, and also slightly future-proofs in case this happens again.
Minor bugfixes
- Fix rotation and copy bugs
Fix image bugs
- Fix bugs with RGBA images
- Fix assert bug
- Add back in thumbnail method for resizing
- Slightly optimize segformer code
Change image resize
- Image resize from cv2 to PIL - cv2 caused benchmark regressions
OCR speedups
- Speed up base OCR model ~15-20%, and reduce memory usage by ~25% (can do higher batch sizes)
- Add static cache for compilation - torch.compile will result in another 15% speedup
- Other optimizations, like faster image resizing
- Bugfixes, like enabling different length language inputs for OCR (batching different docs with different languages together)
Processor improvements
- Remove unneeded format conversions
- Fix bug in OCR, where only one color channel was used for OCR - results should be better now
- Speed up layout/text detection a bit
OCR speedup
Cut OCR time in half. Combined with the previous release, OCR should now take about 40% as much time as it did before.
Significant speedup for layout, line detection
- Improve CPU postprocessing for line detection and layout - cut postprocessing time to 1/3 of original
- Unpin transformers version after investigating model performance
This should result in an ~2x speedup for layout and text detection. The effect will be most noticeable on GPU. I haven't fully benchmarked, though.
Bug fixes
- Fix memory leak with layout and text detection models and large batch sizes
- Improve ordering model generation slightly