Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bugfixes for Hindi text generation #333

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

prathameshza
Copy link

Bugfix for Hindi text generation

bug 1: No Hindi text rendering in official trdg pypi package 1.8.0

reproduce results with:

trdg -l hi -c 10 -w 5

image

गरमटयए जतवन टकटकत फफकर बरत_0

text

गरमटयए जतवन टकटकत फफकर बरत

Using official repository

bug 2: Hindi text don't have matras and matras in image are seperated!

reproduce results with:

git clone https://github.com/Belval/TextRecognitionDataGenerator.git

cd TextRecognitionDataGenerator/trdg/

python3 run.py -l hi -c 10 -w 5

Before bug fix

image

एकशरयत ठठकन ललत-षषठ बखन तर_0

text

एकशरयत ठठकन ललत-षषठ बखन तर

After bug fix

image

कलपाओ घिघियाता हँसोहीं मकी डंडा-डोलिओं_0

text

कलपाओ घिघियाता हँसोहीं मकी डंडा-डोलिओं

Changes made

  • Modified make_filename_valid function in utils.py to prevent it from removing Hindi Matras
  • Modified save the image with try catch block in data_generator.py for OSError 22
  • Replaced Hindi font Lohit-Devanagari with Gargi to separate Hindi Matras

Note: Changing the font also changes the images created per second

Below is the tested font and their speeds for Hindi image generation

Font Speed
Lohit-Devanagari 15-16 it/s
Gargi 17-18 it/s
Sura unicode 11-12 it/s
akshra unicode 4-5 it/s
Kurti dev 010 50-55 it/s
aakar regular 50-55 it/s
freesansbold 9-10 it/s
Nakula 8-9 it/s

I am using Linux Mint 21.3 "Virginia" Cinnamon Edition for testing

  • python version 3.10.12
  • pillow version 9.5.0

I have also tested other languages with the modified changes, they work fine 👍

prathameshza and others added 10 commits February 8, 2024 21:53
Deleting old Hindi font file
Modified the function to prevent it from removing Hindi matras and also to remove invalid characters for image name
Try catch block skips the image if OSError 22 occured
It shows the Hindi text generation and sample of Hindi image
Deleted the file as it was creating OSError
@abhi-glitchhg
Copy link

Thanks for this work.

@abhi-glitchhg
Copy link

But, stll there is problems with the generated images.

image

if you see the second and last word, they are not matching and generating some invalid text.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants