Skip to content

vishalj0501/transcribe_net

Repository files navigation

Transcribe Net

Text Recognition with Transformer Models

This project was focussed on the application of hybrid end-to-end models based on transformers to recognize text in Spanish printed sources from the seventeenth century.

Architecture

This model combines ResNet-101 for feature extraction with a Transformer architecture for sequence modeling. Initially, ResNet-101 extracts visual features, which are then passed through a 1x1 convolutional layer to adapt their dimensionality. The Transformer processes these features along with positional encodings, capturing spatial and sequential information. It predicts token probabilities through linear layers, facilitating Optical Character Recognition tasks.

image

Deployment -> Hugging Face Spaces

Loss Function

KLDivLoss : image

General Dataset Used:

Bentham Dataset -> Download Link

To preprocess the data, run the main() function in data_preprocess/bentham_transform.py or download the preprocessed dataset here: Bentham Preprocessed Data

Epochs : 200

Pretrained weights can be downloaded here -> Pretrain Weights

Test Result:

Evaluation Metrics Used

CER - (Character Error Rate), WER - (Word Error Rate), SER - (Sequence Error Rate)

image

Specific Dataset:

The pre-trained the Transformer model using the Bentham dataset, was fine-tuned on the specific dataset (Spanish Literature). Utilized Pytesseract to segment entire pages into individual lines, which were then preprocessed prior to training the model.

Epochs: 150

Fine Tuned weights can be downloaded here -> Fine Tune Weights

Loss vs Graph:

image

Test Results over all images:

image

Test PDF Page 15 P2 - P: Visual Result:

image

PDF age 16 P2

image

About

Transformer based model for Text Recognition

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published