A novel method that is able to disentangle the content and style aspects of input images by jointly optimizing a generative process and a handwritten word recognizer.
Distilling Content from Style for Handwritten Word Recognition
Lei Kang, Pau Riba, Marçal Rusiñol, Alicia Fornés, and Mauricio Villegas
Accepted to ICFHR2020
- Ubuntu 16.04 x64
- Python 3
- PyTorch 1.0.1
We carry on our experiments on the widely used handwritten dataset IAM.
- The training takes a lot of GPU memory, in my case, it takes 24GB in a GPU RTX 6000 with batchsize 8. Even if we set the basesize to 1, it still takes 16GB GPU memory.
Once the dataset is prepared, you need to denote the correct urls in the file load_data.py
, then you are ready to go. To run from scratch:
./run_train_scratch.sh
Or to start with a saved checkpoint:
./run_train_pretrain.sh
Note: Which GPU to use or which epoch you want to start from could be set in this shell script. (Epoch ID corresponds to the weights that you want to load in the folder save_weights
)
./run_test.sh
And don't forget to change the epoch ID in this shell script to load the correct weights of the model that is corresponding to the epoch ID.
After the content distillation model is trained properly by early stopping, load the recognizer module ConTranModel.rec
and do the fine-tuning with IAM training set alone. The Seq2Seq HTR recognizer can be found here.
If you use the code for your research or application, please cite our paper:
To be filled.