Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
Data		Data
Datathon-IndoML22-Submission		Datathon-IndoML22-Submission
Images		Images
IndoML22		IndoML22
Models		Models
Report		Report
Samples		Samples
Submissions		Submissions
Visualizations		Visualizations
.gitattributes		.gitattributes
Basic-Model.ipynb		Basic-Model.ipynb
Data-Overview.ipynb		Data-Overview.ipynb
EfficientNet-4Piece-Model-Embed-Visualization.ipynb		EfficientNet-4Piece-Model-Embed-Visualization.ipynb
EfficientNet-4Piece-Model.ipynb		EfficientNet-4Piece-Model.ipynb
EfficientNet-4Piece-PCA-LSTM-Model.ipynb		EfficientNet-4Piece-PCA-LSTM-Model.ipynb
EfficientNet-4Piece-PCA-Model.ipynb		EfficientNet-4Piece-PCA-Model.ipynb
EfficientNet-4Piece-XGBoost-Model.ipynb		EfficientNet-4Piece-XGBoost-Model.ipynb
EfficientNet-LightGBM-Model.ipynb		EfficientNet-LightGBM-Model.ipynb
EfficientNet-Model-Embed-Visualization.ipynb		EfficientNet-Model-Embed-Visualization.ipynb
EfficientNet-Only-Model.ipynb		EfficientNet-Only-Model.ipynb
EfficientNet-RandomForest-Model.ipynb		EfficientNet-RandomForest-Model.ipynb
EfficientNet-XGBoost-Model.ipynb		EfficientNet-XGBoost-Model.ipynb
EfficientNet-kPiece-Model-Embed-Visualization.ipynb		EfficientNet-kPiece-Model-Embed-Visualization.ipynb
EfficientNet-kPiece-Model.ipynb		EfficientNet-kPiece-Model.ipynb
EfficientNet-kPiece-PCA-LSTM-Model.ipynb		EfficientNet-kPiece-PCA-LSTM-Model.ipynb
EfficientNet-kPiece-PCA-Model.ipynb		EfficientNet-kPiece-PCA-Model.ipynb
Inception-ResNet-4Piece-Model.ipynb		Inception-ResNet-4Piece-Model.ipynb
Inception-ResNet-4Piece-PCA-Model.ipynb		Inception-ResNet-4Piece-PCA-Model.ipynb
Inception-ResNet-kPiece-ViT-Embed-Viz.ipynb		Inception-ResNet-kPiece-ViT-Embed-Viz.ipynb
Inception-ResNet-kPiece-ViT-Model.ipynb		Inception-ResNet-kPiece-ViT-Model.ipynb
LICENSE.md		LICENSE.md
README.md		README.md
ResNet-4Piece-LSTM-Model.ipynb		ResNet-4Piece-LSTM-Model.ipynb
ResNet-4Piece-Model.ipynb		ResNet-4Piece-Model.ipynb
ResNet-4Piece-PCA-Model.ipynb		ResNet-4Piece-PCA-Model.ipynb
ResNet-VGG-4Piece-Model.ipynb		ResNet-VGG-4Piece-Model.ipynb
ResNet-VGG-Inception-4Piece-Indiv-Concat-Model.ipynb		ResNet-VGG-Inception-4Piece-Indiv-Concat-Model.ipynb
ResNet-VGG-Inception-4Piece-PCA-Embed-Viz.ipynb		ResNet-VGG-Inception-4Piece-PCA-Embed-Viz.ipynb
ResNet-VGG-Inception-4Piece-PCA-Model-Full-Data.ipynb		ResNet-VGG-Inception-4Piece-PCA-Model-Full-Data.ipynb
ResNet-VGG-Inception-4Piece-PCA-Model.ipynb		ResNet-VGG-Inception-4Piece-PCA-Model.ipynb
ResNet-VGG-Inception-4Piece-PCA-XGBoost-Model.ipynb		ResNet-VGG-Inception-4Piece-PCA-XGBoost-Model.ipynb
ResNet-VGG-Inception-RoI-Model.ipynb		ResNet-VGG-Inception-RoI-Model.ipynb
ResNet-kPiece-Model.ipynb		ResNet-kPiece-Model.ipynb
ResNet-kPiece-ViT-Model.ipynb		ResNet-kPiece-ViT-Model.ipynb
RoI-Embeddings-InceptionResNet-VGG-ResNet.ipynb		RoI-Embeddings-InceptionResNet-VGG-ResNet.ipynb
RoI-Extraction.ipynb		RoI-Extraction.ipynb
VGG16-4Piece-Model.ipynb		VGG16-4Piece-Model.ipynb
VGG16-4Piece-PCA-Model.ipynb		VGG16-4Piece-PCA-Model.ipynb
VGG16-kPiece-ViT-Model.ipynb		VGG16-kPiece-ViT-Model.ipynb
Validation-Label.ipynb		Validation-Label.ipynb
efficientnets-roi-feature-extraction.ipynb		efficientnets-roi-feature-extraction.ipynb
inception-resnet50v2-4piece-feature-extraction.ipynb		inception-resnet50v2-4piece-feature-extraction.ipynb
inceptionresnet50v2-roi-feature-extraction.ipynb		inceptionresnet50v2-roi-feature-extraction.ipynb
resnet50v2-4piece-feature-extraction.ipynb		resnet50v2-4piece-feature-extraction.ipynb
resnet50v2-roi-feature-extraction.ipynb		resnet50v2-roi-feature-extraction.ipynb

Repository files navigation

Scanned Document Classification

We take lots of scanned images of documents of various type some taken on handheld devices, some using scanners, etc. So, it becomes increasingly important to organize these scanned documents, which requires reliable and high quality classification of these scanned document images into several categories like letter, form, etc.

This is a part of IndoML22(Indian Symposium of Machine Learning-2022) Datathon Challenge.

Data

The training and validation data is provided in the Datathon which is a subset of 16000 grayscale images from the RVL-CDIP dataset with 1000 images belonging to each of the 16 categories in which the images are classified. The competition and the data is released in its Kaggle Competition.

Images span across 16 different categories(with their corresponding labels) from the training set as shown below:

Letter(0)	Form(1)	Email(2)	Handwritten(3)

Advertisement(4)	Scientific Report(5)	Scientific Publication(6)	Specification(7)

File Folder(8)	News Article(9)	Budget(10)	Invoice(11)

Presentation(12)	Questionnaire(13)	Resume(14)	Memo(15)

A discussion about the data with few more images from both training and validation set displayed can be seen in the data overview notebook

Task

The task is to build a model to classify the images correctly into it's respective category and the performance will be evaluated using the Mean F1-Score. The F1 score, commonly used in information retrieval, measures accuracy using the statistics precision $(\text{p})$ and recall $(\text{r})$.

Precision is the ratio of true positives $(\text{tp})$ to all predicted positives $(\text{tp} + \text{fp})$. Recall is the ratio of true positives $(\text{tp})$ to all actual positives $(\text{tp} + \text{fn})$. The F1 score is given by:

$$ \text{F1} = 2\frac{\text{p} \cdot \text{r}}{\text{p}+\text{r}}\ \ \mathrm{where}\ \ \text{p} = \frac{\text{tp}}{\text{tp}+\text{fp}},\ \ \text{r} = \frac{\text{tp}}{\text{tp}+\text{fn}} $$

The F1 metric weights recall and precision equally, and a good retrieval algorithm will maximize both precision and recall simultaneously. Thus, moderately good performance on both will be favored over extremely good performance on one and poor performance on the other.

Method

Various visual feature extraction based methods were applied using EfficientNetV2L pretrained model(trained on ImageNet). Two of them are:

EfficientNet followed by FFN (EffNet)
Partioned Image based EfficientNet followed by FFN (EffNet-4Piece)
InceptionResNetV2 along with RoI based Vision Transformer Network (IncResNet-RoI-ViT) [Model Report]
ResNet-VGG-InceptionResNetV2 along with PCA followed by FFN (ResVGGInc-PCA-4Piece) [Model Report]

The results of clustering of the learnt penultimate layer feature vector for the above two models for the training set is shown below:

EffNet (Mean-F1: 0.6)	EffNet-4Piece (Mean-F1: 0.68)

IncResNet-RoI-ViT (Mean-F1: 0.755)	ResVGGInc-PCA-4Piece (Mean-F1: 0.785)

Usage

Refer to the IndoML22 folder it contains README.txt file which contains all the information about how to train the ViT model using train.ipynb and inferencing trained model using test.ipynb.
Colab Notebooks: train notebook and test.ipynb. Going through README.txt as mentioned above will help better understand the directory structure.
Link to the Pretrained Model to be updated.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scanned Document Classification

Data

Task

Method

Usage

About

Releases

Packages

Languages

License

RishiDarkDevil/Document-Classification

Folders and files

Latest commit

History

Repository files navigation

Scanned Document Classification

Data

Task

Method

Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages