EIGEN: Expert-Informed Joint Learning Aggregation for High-Fidelity Information Extraction from Document Images
Abhishek Singh, Venkatapathy Subramaninan, Ayush Maheshwari, Pradeep Narayan, Devi Prasad Shetty and Ganesh Ramakrishnan
Machine Learning For Health, (ML4H) 2023
###Instruction For Training Create a new virtual environment, navigate to this directory and run the following command:
- git clone main branch.
- Download CORDS receipt dataset in current directory "https://drive.google.com/drive/folders/1mKrsYBW7xXzfxNLSYwQ02bHayqVfe-94?usp=sharing".
pip install -r requirements.txt
for installing all the dependency.git clone https://github.com/iitb-research-code/spear4HighFidelity.git
to get all the required files to run spear and CAGE.- Then change labeling function as per your need, Ex- adding or removing labeling function and make appropriate changes.(optional).
- Run labeling_function file
python main.py
- Your pickle file which was required for training and trained Model files will get store in Paths folder.
###Files information
- Cage_cords.ipynb is the file which contains code for running CAGE model on Cords dataset.
- NH_cage.ipynb is the file which contains code for running CAGE model on NH dataset.
- Paths directory contain all the pickle files which is needed for training.
- cords_demo.ipynb is the file which contains code for running inference on CORDS data from the stored model.
- nh_demo.ipynb is the file which contains code for running inference on NH data from the stored model.
- train.py has the code for Jointly training of feature model and Cage model.