Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
andrewlalis authored Mar 25, 2022
1 parent d727ff5 commit 25a3f4a
Showing 1 changed file with 6 additions and 0 deletions.
6 changes: 6 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,4 +11,10 @@ To start analyzing some emails, you can either:
- Open an existing dataset, via `File > Open Dataset`. You may open datasets from directories, or from compressed ZIP files; in the latter case, the zip file will be unzipped to a directory.

## Generating a Dataset
![image](https://user-images.githubusercontent.com/9953867/160172140-2c91753d-d1b1-42ee-8907-ba108360cd68.png)

Datasets are generated by processing one or more directories containing `.mbox` files. After clicking `File > Generate Dataset`, you'll see the above dialog, where you can specify one or more directories to process. If you don't yet have any mbox files, you may click `Download Emails` to open a simple dialog where you can download mbox files from Apache's API by specifying the mailing list name, domain, and a directory to download to. You can browse available mail archives on the [Apache Mail Archives site](https://lists.apache.org/).

Finally, select a directory to generate the dataset in, via the `Generate to` field.

Once you click `Generate`, the mbox files will be parsed to generate a dataset consisting of Lucene index files and an H2 database file, in the specified directory.

0 comments on commit 25a3f4a

Please sign in to comment.