Handwritten Text Full-Page OCR

This project focuses on improving OCR for handwritten texts, extending TrOCR to handle full-page structured text (paragraphs and essays). It includes scripts and methods used throughout the development process, from initial experimentation to the final optimized methodology.

Inference Example

How to Run the Project

Run the Application with UI
Launch the application with a graphical interface using the command:
```
python ui.py
```
Generate Synthetic Data
Generate labeled synthetic data using the provided notebook:
- Open and run synthetic_data_generation_final.ipynb in your Jupyter Notebook or preferred environment.

Methodology Summary

The project used a systematic approach to train a model for detecting bounding boxes of high-quality text patches:

Dataset Creation:
- Divided images into overlapping patches of fixed height (~2× font size) and full width.
- Applied TrOCR to detect lines in patches.
- Filtered good patches based on confidence scores and removed duplicates using BLEU scores.
Model Training:
- Trained a YOLO model (initialized with YOLOv11 weights) on approximately 1,100 labeled examples.
- Trained for 100 epochs, achieving a final validation loss of <1.

Key Features

Enhanced OCR for Handwritten Text: Focuses on structured layouts while avoiding complex graphs or unstructured data.
Synthetic Data Generation: Automated label generation using a brute-force patching approach and TrOCR.
Efficient Text Detection: Optimized bounding box detection using a YOLO model to streamline the pipeline.
Full-Page OCR Pipeline: Handles full-page text detection and recognition for structured text.

Limitations

Out-of-Vocabulary Characters: Some characters, like the Greek letter sigma, are mapped to visually or semantically similar known characters due to model limitations.

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
.idea		.idea
.ipynb_checkpoints		.ipynb_checkpoints
.virtual_documents		.virtual_documents
benchmark		benchmark
data		data
model		model
.gitignore		.gitignore
0.37.0		0.37.0
README.md		README.md
bounding_box_detector.ipynb		bounding_box_detector.ipynb
comparison.py		comparison.py
dataset_transform.ipynb		dataset_transform.ipynb
deskewed.png		deskewed.png
detr.ipynb		detr.ipynb
detr_fintetune.ipynb		detr_fintetune.ipynb
dtrocr.ipynb		dtrocr.ipynb
final_wts.pt		final_wts.pt
height_initialisation.ipynb		height_initialisation.ipynb
llama_to_roberta.json		llama_to_roberta.json
llm.ipynb		llm.ipynb
ocr_with_llm.ipynb		ocr_with_llm.ipynb
owl.ipynb		owl.ipynb
playground.ipynb		playground.ipynb
prepare_raw_dataset.ipynb		prepare_raw_dataset.ipynb
preprocess.ipynb		preprocess.ipynb
preprocess2.ipynb		preprocess2.ipynb
processed.png		processed.png
synthetic_data.ipynb		synthetic_data.ipynb
synthetic_data_generation_final.ipynb		synthetic_data_generation_final.ipynb
train_model.yaml		train_model.yaml
trocr.ipynb		trocr.ipynb
trocr_custom.ipynb		trocr_custom.ipynb
ui.py		ui.py
yolo.ipynb		yolo.ipynb
yolo_inference_final.ipynb		yolo_inference_final.ipynb
yolo_transformers.ipynb		yolo_transformers.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Handwritten Text Full-Page OCR

Inference Example

How to Run the Project

Methodology Summary

Key Features

Limitations

About

Releases

Packages

Contributors 3

Languages

amaljoe/OCR_with_LLMs

Folders and files

Latest commit

History

Repository files navigation

Handwritten Text Full-Page OCR

Inference Example

How to Run the Project

Methodology Summary

Key Features

Limitations

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages