This repo is a implementation of the Layoutlm Model, see [1], from the sourcecode (as I didn't manage to make it work with the huggingface implementation : HuggingFace Implementation and benchmarked on the CORD Dataset, see [2].
I compare the performance of the pre-train LayoutLM on IIT-CDIP dataset (version LARGE) with the Bert (version Large).
Model | F1_Score | Precision | Recall |
---|---|---|---|
LayoutLM Large | 0.9562 | 0.9577 | 0.9546 |
Bert Large | 0.9474 | 0.9466 | 0.9481 |
Model | F1_Score | Precision | Recall |
---|---|---|---|
LayoutLM Large | 0.9843 | 0.9845 | 0.9841 |
Bert Large | 0.9859 | 0.9861 | 0.9856 |
In the validation set, Layoutlm outperformed Bert, but it is not the case in the test set. I need to do more
investigation.
Nevertheless, it took Bert 11 minutes to finish the training (4 epochs) while Layoutlm needed only 3 minutes.
(same environment, setup ..)
I am using the Layoutlm Large, files of the pre-trained model can be found on these links :
OneDrive /
GoogleDrive
Other ressources can be found on the original repository : Official Layoutlm
##TODO
I will soon put a script for the training, otherwise you can always check my notebooks.
I will also give more details about the dataset, the notebook's structure ...
[1] Yiheng Xu and Minghao Li and Lei Cui and Shaohan Huang and Furu Wei and Ming Zhou (2019) , LayoutLM: Pre-training of Text and Layout for Document Image Understanding (https://arxiv.org/abs/1912.13318), https://github.com/microsoft/unilm/tree/master/layoutlm
[2] Park, Seunghyun and Shin, Seung and Lee, Bado and Lee, Junyeop and Surh, Jaeheung and Seo, Minjoon and Lee, Hwalsuk (2019) CORD: A Consolidated Receipt Dataset for Post-OCR Parsing (Document Intelligence Workshop at Neural Information Processing Systems) https://github.com/clovaai/cord