Skip to content

Running the EXSCLAIM Pipeline

Trevor Spreadbury edited this page Apr 12, 2021 · 1 revision

Using the EXSCLAIM Pipeline

Depending on your use case and experience with Python, you can use EXSCLAIM as a Python import, a command-line tool, or (easiest) use its user interface.

Requirements

  • To use the exsclaim pipeline, you must have it installed.
  • Using EXSCLAIM requires a user-generated query. These tell the pipeline how to run and what to look for. In a query, you specify which keywords to look for, which journals to look for them in, how many to articles to look at, and how to log and store results. The full query schema is available in the wiki and examples can be found in the query directory.
  • Once you have a query, you can choose what tools to run. The options are JournalScraper, CaptionDistributor, and FigureSeparator. For most cases you will want to run all three, which is the default behavior.

Results

The result of running the exsclaim pipeline is a dataset of images from published journal articles labeled with their captions and other extracted metadata. For more information, see Viewing Results.

Methods

Importing EXSCLAIM

You can import EXSCLAIM to run in Python scripts (or modules):

from exsclaim.pipeline import Pipeline
test_pipeline = Pipeline(query)
results = test_pipeline.run()

query can either be a Python dictionary or the path to a JSON file. Either must have the parameters(/keys/attributes) defined in the Query JSON schema and examples can be found in the query directory.

If you wish to run only a subset of tools, you can use the keyword arguments like this:

results = test_pipeline.run(figure_separator=True, caption_distributor=True, journal_scraper=True)

settting those you wish not to use to False.

Command-Line Tool

You can utilize EXSCLAIM from the command line:

$ exsclaim /path/to/query.json

To specify which tools to run, use the --tools flag. The default is to run all tools. For example:

$ exsclaim run /path/to/query.json --tools jc

After the --tools flag, provide the first letter of each tool you wish to run. The above command will run the JournalScraper and CaptionDistributor.

User Interface

To use the UI, you must have PostgreSQL installed. To download, check the official instructions.

Then in the command-line, type:

$ exsclaim view

Do not close your command line window while using the UI. Navigate to http://127.0.0.1:8000/ to use the UI. From here you can navigate to the query page to submit a query using a simple web form, or to the results page to explore and filter results.