Transcribe Net

Text Recognition with Transformer Models

This project was focussed on the application of hybrid end-to-end models based on transformers to recognize text in Spanish printed sources from the seventeenth century.

Architecture

This model combines ResNet-101 for feature extraction with a Transformer architecture for sequence modeling. Initially, ResNet-101 extracts visual features, which are then passed through a 1x1 convolutional layer to adapt their dimensionality. The Transformer processes these features along with positional encodings, capturing spatial and sequential information. It predicts token probabilities through linear layers, facilitating Optical Character Recognition tasks.

Deployment -> Hugging Face Spaces

Loss Function

KLDivLoss :

General Dataset Used:

Bentham Dataset -> Download Link

To preprocess the data, run the main() function in data_preprocess/bentham_transform.py or download the preprocessed dataset here: Bentham Preprocessed Data

Epochs : 200

Pretrained weights can be downloaded here -> Pretrain Weights

Test Result:

Evaluation Metrics Used

CER - (Character Error Rate), WER - (Word Error Rate), SER - (Sequence Error Rate)

Specific Dataset:

The pre-trained the Transformer model using the Bentham dataset, was fine-tuned on the specific dataset (Spanish Literature). Utilized Pytesseract to segment entire pages into individual lines, which were then preprocessed prior to training the model.

Epochs: 150

Fine Tuned weights can be downloaded here -> Fine Tune Weights

Loss vs Graph:

Test Results over all images:

Test PDF Page 15 P2 - P: Visual Result:

PDF age 16 P2

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
assets		assets
data_preprocess		data_preprocess
model		model
notebook		notebook
spanish_test_dataset		spanish_test_dataset
test		test
train		train
utils		utils
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transcribe Net

Text Recognition with Transformer Models

Architecture

Deployment -> Hugging Face Spaces

Loss Function

General Dataset Used:

Specific Dataset:

About

Releases

Packages

Languages

vishalj0501/transcribe_net

Folders and files

Latest commit

History

Repository files navigation

Transcribe Net

Text Recognition with Transformer Models

Architecture

Deployment -> Hugging Face Spaces

Loss Function

General Dataset Used:

Specific Dataset:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages