A quick script to extract the highlights I make on PDF files using a Kobo Elipsa.
The Kobo Elipsa allows you to export highlights made in documents, as long as those highlights are not made using the stylus that comes included with the eReader. Given that reading and highlighting PDFs was the entire point of me buying the thing, I decided to make sure my freehand highlights could be extracted from all of the papers I read with some Python.
- Connect your Elipsa to your computer
- Copy the mounted drive into a folder
- Plop this script in the new folder
- Create a virtualenv and install the requirements
- Run the script with
python highlight_extractor.py
This will create an annotations
directory, with sub-directories for each file
that a highlight is detected in. Each file directory is then divided by page,
with those files containing PNG clips of the highlights. There is an
index.html
file that contains the document titles followed by all of the
embedded image highlights.
- Clean up code
- Add tests
- Add cli
- Package
- Put on pypi
- Stitch clips together
- OCR to make highlights indexable/searchable
- Add GUI