Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finetune smaller model on computer screenshots ? #113

Open
apirrone opened this issue May 26, 2024 · 2 comments
Open

Finetune smaller model on computer screenshots ? #113

apirrone opened this issue May 26, 2024 · 2 comments

Comments

@apirrone
Copy link

Hi !

First, great work, from what I tested it seems to work really well, congrats !

I have an use case where I need to perform OCR/Layout analysis etc on computer screenshots. surya actually works really well for such images, but I wonder how a smaller model trained only on such images would perform. In my use case, the screenshots would need to be fully processed quite fast (ideally under 2 seconds per screenshot) and without taking too much memory or CPU/GPU.

Maybe I am wrong, but the problem seems simpler than training a general model that works on any kind of document like surya does. Do you think a small model could do the job ?

Thanks !

@metatrot
Copy link

I'm also looking for a screenshot use-case. Most OCR seems geared to photos, handwriting, or PDFs. They don't do great on normal GUI text.

@yechens
Copy link

yechens commented Jan 8, 2025

Hi !

First, great work, from what I tested it seems to work really well, congrats !

I have an use case where I need to perform OCR/Layout analysis etc on computer screenshots. surya actually works really well for such images, but I wonder how a smaller model trained only on such images would perform. In my use case, the screenshots would need to be fully processed quite fast (ideally under 2 seconds per screenshot) and without taking too much memory or CPU/GPU.

Maybe I am wrong, but the problem seems simpler than training a general model that works on any kind of document like surya does. Do you think a small model could do the job ?

Thanks !

Perhaps you could try Baidu PPOCR models, which are fast, accurate, lightweight, and easy to fine-tune with your own dataset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants