Skip to content

Latest commit

 

History

History
110 lines (70 loc) · 1.78 KB

DOCUMENTATION.md

File metadata and controls

110 lines (70 loc) · 1.78 KB

SpiceJack Documentation

Installation

pip install spicejack

Usage

Creating a Processor object

Currently, SpiceJack only supports pdf files. This will be extended in the future, create an issue to request another file type.

To use SpiceJack, first import the processor:

from spicejack.pdf import PDFprocessor

And then create a processor:

processor = PDFprocessor(
    filepath,
    filters,
    use_legitimate,
    model
)

filepath

Path of the PDF file.

filters

List of extra custom filters. See Custom Filters

use_legitimate

Use the official OpenAI API

model

Model to use for generation

Running the processor

processor.run(
    thread,
    process,
    logging,
    autosave
)

thread

Whether to run the processor in a child thread.

process

Whether to run the processor in a child process.

logging

Whether to print the JSON responses from the LLM.

autosave

Whether to save the result to result.json every time a sentence is parsed.

processor.run also returns the result.

Finishing touches

Now you can save the result to a file.

processor.save(
    jsonpath
)

jsonpath

Path of the json file to save the result.

Custom filters

The way SpiceJack works is that it reads the pdf file, cleans it up using a few filters, and then splits it into sentences. Then it converts the sentences into json questions and answers using an LLM

You can create custom filters

from spicejack.pdf import PDFprocessor

def filter1(list):
    return [
        i.replace(" percent","%")
        for i in list
    ]

processor = PDFprocessor(
    filters=[filter1],
)