Skip to content

ayushayush591/EIGEN-High-Fidelity-Extraction-Document-Images

Repository files navigation

Joint Learning Aggregation for High-Fidelity Information Extraction from Document Images

EIGEN: Expert-Informed Joint Learning Aggregation for High-Fidelity Information Extraction from Document Images
Abhishek Singh, Venkatapathy Subramaninan, Ayush Maheshwari, Pradeep Narayan, Devi Prasad Shetty and Ganesh Ramakrishnan
Machine Learning For Health, (ML4H) 2023


###Instruction For Training Create a new virtual environment, navigate to this directory and run the following command:

  1. git clone main branch.
  2. Download CORDS receipt dataset in current directory "https://drive.google.com/drive/folders/1mKrsYBW7xXzfxNLSYwQ02bHayqVfe-94?usp=sharing".
  3. pip install -r requirements.txt for installing all the dependency.
  4. git clone https://github.com/iitb-research-code/spear4HighFidelity.gitto get all the required files to run spear and CAGE.
  5. Then change labeling function as per your need, Ex- adding or removing labeling function and make appropriate changes.(optional).
  6. Run labeling_function file python main.py
  7. Your pickle file which was required for training and trained Model files will get store in Paths folder.

###Files information

  1. Cage_cords.ipynb is the file which contains code for running CAGE model on Cords dataset.
  2. NH_cage.ipynb is the file which contains code for running CAGE model on NH dataset.
  3. Paths directory contain all the pickle files which is needed for training.
  4. cords_demo.ipynb is the file which contains code for running inference on CORDS data from the stored model.
  5. nh_demo.ipynb is the file which contains code for running inference on NH data from the stored model.
  6. train.py has the code for Jointly training of feature model and Cage model.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published