Utility for generating Lucene indexed datasets for collections of emails. Includes the following components:
- Dataset generation (including indexing)
- Dataset data access
- Lucene searching
- Data exporting to a variety of formats.
The datasets generated by this library are structured simply as a directory containing multiple files and/or subdirectories:
- The
index
directory contains all files used by Apache Lucene for creating and searching over indexes. - The
database.mv.db
file is the self-contained H2 relational database that contains all emails and any associated tags. - A
metadata.properties
file contains meta information about the dataset. Currently stores the version number.