This repository contains a Docker container for the CLTK.
First, clone this repository:
$ git clone https://github.com/cltk/cltk_docker.git
$ cd cltk_docker
Build the image:
$ docker build -t cltk .
To run the image:
$ docker run -it cltk
>>> from cltk.corpus.utils.importer import CorpusImporter
>>> c = CorpusImporter('latin')
>>> c.list_corpora
['latin_text_perseus', 'latin_treebank_perseus', 'latin_text_lacus_curtius', 'latin_text_latin_library', 'phi5', 'phi7', 'latin_proper_names_cltk', 'latin_models_cltk', 'latin_pos_lemmata_cltk', 'latin_treebank_index_thomisticus', 'latin_lexica_perseus', 'latin_training_set_sentence_cltk', 'latin_word2vec_cltk', 'latin_text_antique_digiliblt', 'latin_text_corpus_grammaticorum_latinorum']
This Dockerfile
uses three data volumes, which you can use to persist data across runs or map a directory from the Docker host:
/cltk_data
/nltk_data
/data
So if you use e.g. docker volume create cltk_data
, you can then use docker run -ti -v cltk_data:/cltk_data ctlk
, and any corpora installed will persist when you use the same volume. If your Docker host has already installed corpora locally, you could instead use e.g. docker run -ti -v $HOME/cltk_data:/cltk_data cltk
.
This container also comes with a helper script, install_corpora.py
, which can be used to install all corpora:
docker run -ti -v cltk_data:/cltk_data cltk install_corpora.py
Or corpora for specific languages:
docker run -ti -v cltk_data:/cltk_data cltk install_corpora.py greek latin
The Dockerfile.jupyter
file also defines a Jupyter Notebook container with CLTK installed. You can build it with docker build -t cltk-jupyter -f Dockerfile.jupyter .
, and run it with e.g. (also using a mapped data volume as in the example above) docker run -p 8888:8888 -v cltk_data:/cltk_data cltk-jupyter
(see the Jupyter Docker Stacks Quick Start documentation for more examples)
MIT. See LICENSE.txt.