-
Notifications
You must be signed in to change notification settings - Fork 8
Running the EXSCLAIM Pipeline
Depending on your use case and experience with Python, you can use EXSCLAIM as a Python import, a command-line tool, or (easiest) use its user interface.
- To use the exsclaim pipeline, you must have it installed.
- Using EXSCLAIM requires a user-generated query. These tell the pipeline how to run and what to look for. In a query, you specify which keywords to look for, which journals to look for them in, how many to articles to look at, and how to log and store results. The full query schema is available in the wiki and examples can be found in the query directory.
- Once you have a query, you can choose what tools to run. The options are JournalScraper, CaptionDistributor, and FigureSeparator. For most cases you will want to run all three, which is the default behavior.
The result of running the exsclaim pipeline is a dataset of images from published journal articles labeled with their captions and other extracted metadata. For more information, see Viewing Results.
You can import EXSCLAIM to run in Python scripts (or modules):
from exsclaim.pipeline import Pipeline
test_pipeline = Pipeline(query)
results = test_pipeline.run()
query
can either be a Python dictionary or the path to a JSON file. Either must have the parameters(/keys/attributes) defined in the Query JSON schema and examples can be found in the query directory.
If you wish to run only a subset of tools, you can use the keyword arguments like this:
results = test_pipeline.run(figure_separator=True, caption_distributor=True, journal_scraper=True)
settting those you wish not to use to False.
You can utilize EXSCLAIM from the command line:
$ exsclaim /path/to/query.json
To specify which tools to run, use the --tools
flag. The default is to run all tools. For example:
$ exsclaim run /path/to/query.json --tools jc
After the --tools
flag, provide the first letter of each tool you wish to run. The above command will run the JournalScraper and CaptionDistributor.
To use the UI, you must have PostgreSQL installed. To download, check the official instructions.
Then in the command-line, type:
$ exsclaim view
Do not close your command line window while using the UI. Navigate to http://127.0.0.1:8000/ to use the UI. From here you can navigate to the query page to submit a query using a simple web form, or to the results page to explore and filter results.