Skip to content

Commit

Permalink
polish
Browse files Browse the repository at this point in the history
  • Loading branch information
fscelliott committed Dec 2, 2024
1 parent 181c3f1 commit 618abe8
Show file tree
Hide file tree
Showing 3 changed files with 9 additions and 8 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -449,13 +449,16 @@ The following image shows the example document used with this example config:
Notes
===

For an overview of how the List method works, see the following steps:
For an overview of how this method works when `"searchBySummarization": false`, see the following steps:

1. Sensible finds the chunks of the document that most likely contain your target data:

- Sensible concatenates all your property descriptions with your overall list description.
- Sensible splits the document into equal-sized chunks.
- Sensible scores your concatenated list descriptions against each chunk.
- Sensible concatenates all your property descriptions with your overall list description.

- Sensible splits the document into equal-sized chunks.

- Sensible scores your concatenated list descriptions against each chunk.


2. Sensible selects a number of the top-scoring chunks:
1. If you specify Thorough for the LLM Engine parameter, the Chunk Count parameter determines the number of top-scoring chunks Sensible selects to submit to the LLM.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -45,8 +45,6 @@ Parameters

**Note** You can configure some of the following parameters in both the [NLP](doc:nlp) preprocessor and in a field's method. If you configure both, the field's parameter overrides the NLP preprocessor's parameter.



## Parameters

| key | value | description |
Expand All @@ -62,7 +60,7 @@ Parameters
| :--------------------- | :--------------- | :----------------------------------------------------------- | ------------------------------------------------------------ |
| id (**required**) | `queryGroup` | | |
| queries | array of objects | An array of query objects, where each extracts a single fact and outputs a single field. Each query contains the following parameters:<br/>`id` (**required**) - The ID for the extracted field. <br/>`description` (**required**) - A free-text question about information in the document. For example, `"what's the policy period?"` or `"what's the client's first and last name?"`. For more information about how to write questions (or "prompts"), see [Query Group](https://docs.sensible.so/docs/query-group-tips) extraction tips. | |
| chunkScoringText | string | Use this parameter to narrow down the page location of the answer to your prompt. For details about context and chunks, see the Notes section.<br/>A representative snippet of text from the part of the document where you expect to find the answer to your prompt. For example, if your prompt has multiple candidate answers, and the correct answer is located near unique or distinctive text that's difficult to incorporate into your question, then specify the distinctive text in this parameter.<br/>If specified, Sensible uses this text to score chunks' relevancy. If unspecified, Sensible uses the prompt to score chunks.<br/>Sensible recommends that the snippet is specific to the target chunk, semantically similar to the chunk, and structurally similar to the chunk. <br/>For example, if the chunk contains a street address formatted with newlines, then provide a snippet with an example street address that contains newlines, like `123 Main Street\nLondon, England`. If the chunk contains a street address in a free-text paragraph, then provide an unformatted street address in the snippet. | If you set the Search By Summarization paramter to true, Sensible ignores any configured value for this parameter. |
| chunkScoringText | string | Use this parameter to narrow down the page location of the answer to your prompt. For details about context and chunks, see the Notes section.<br/>A representative snippet of text from the part of the document where you expect to find the answer to your prompt. For example, if your prompt has multiple candidate answers, and the correct answer is located near unique or distinctive text that's difficult to incorporate into your question, then specify the distinctive text in this parameter.<br/>If specified, Sensible uses this text to score chunks' relevancy. If unspecified, Sensible uses the prompt to score chunks.<br/>Sensible recommends that the snippet is specific to the target chunk, semantically similar to the chunk, and structurally similar to the chunk. <br/>For example, if the chunk contains a street address formatted with newlines, then provide a snippet with an example street address that contains newlines, like `123 Main Street\nLondon, England`. If the chunk contains a street address in a free-text paragraph, then provide an unformatted street address in the snippet. | If you set the Search By Summarization parameter to true, Sensible ignores any configured value for this parameter. |
| multimodalEngine | object | Configure this parameter to:<br/>- Extract data from images embedded in a document, for example, photos, charts, or illustrations.<br/>- Troubleshoot extracting from complex text layouts, such as overlapping lines, lines between lines, and handwriting. For example, use this as an alternative to the [Signature](doc:signature) method, the [Nearest Checkbox](doc:nearest-checkbox) method, the [OCR engine](doc:ocr-engine), and line [preprocessors](doc:preprocessors).<br/><br/>This parameter sends an image of the document region containing the target data to a multimodal LLM (GPT-4o mini), so that you can ask questions about text and non-text images. This bypasses Sensible's [OCR](doc:ocr) and direct-text extraction processes for the region. <br/>This parameter has the following parameters:<br/><br/>`region`: The document region to send as an image to the multimodal LLM. Configurable with the following options :<br/><br/>- To automatically select the [context](doc:query-group#notes) as the region, specify `"region": "automatic"`. If you configure this option for a non-text image, then help Sensible locate the context by including queries in the group that target text near the image, or by specifying the nearby text in the Chunk Scoring Text parameter. <br/><br/>- To manually specify a region, specify an [anchor](doc:anchor) close to the region you want to capture. Specify the region's dimensions in inches relative to the anchor using the [Region](doc:region) method's parameters, for example:<br/>`"region": { `<br/> `"start": "below",`<br/> `"width": 8,`<br/> `"height": 1.2,`<br/> `"offsetX": -2.5,`<br/> `"offsetY": -0.25`<br/> `}` | If you configure this parameter, Sensible doesn't support confidence signals for the multimodal output. |
| llmEngine | object | Where applicable, configures the LLM engine Sensible uses to answer your prompts. <br/>Configure this parameter to troubleshoot situations in which Sensible correctly identifies the part of the document that contains the answers to your prompts, but the LLM's answer contains problems. For example, Sensible returns an LLM error because the answer isn't properly formatted, or the LLM doesn't follow instructions in your prompt.<br/><br/>Contains the following parameters:<br/>`provider`: <br/>- If set to `open-ai` (default), Sensible uses GPT-4o mini where not hard coded. See the Notes section for more information. <br/> - If set to `anthropic`, Sensible uses Claude 3 Haiku where not hard coded. See the Notes section for more information. | |
| searchBySummarization | | or information about this parameter, see [Advanced LLM prompt configuration](doc:prompt#parameters). | |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ For parameters specific to an LLM-based method, see its reference topic, for exa
| chunkOverlapPercentage | `0`, `0.25`, `0.5` defaults:<br/>Query Group: 0.5<br/>List: 0<br/>NLP Table: 0.5 | The extent to which chunks overlap, as a percentage of the chunks' height. For example, `0.5` specifies each chunk overlaps by half its height. <br/>Sensible recommends setting a non-zero overlap to avoid splitting data across chunks. Set overlap to 0 solely if you're confident that your document layout doesn't flow across page boundaries and you're using a one-page chunk size. | If you set the Search By Summarization parameter to true, then Sensible sets this parameter to 0 and ignores any configured value. |
| | | | |
| | | ***FIND CONTEXT*** | |
| searchBySummarization | boolean. default: false | Set this to true to troubleshoot situations in which Sensible misidentifies the part of the document that contains the answers to your prompts. <br/>This parameter is compatible with documents up to 1,280 pages long.<br/>When true, Sensible uses a [completion-only retrieval-augmented generation (RAG) strategy](https://www.sensible.so/blog/embeddings-vs-completions-only-rag): Sensible prompts an LLM to summarize each page in the document, prompts a second LLM to return the pages most relevant to your prompt based on the summaries, and extracts the answers to your prompts from those pages.<br/>For more information about this parameter, see the Notes section. | If you set this parameter to true, then Sensible sets the following for chunk-related parameters and ignores any configured values:<br/><br/>- Chunk Size parameter: 1<br/>- Chunk Overlap Percentage parameter: 0<br/>- Chunk Count parameter: 5 <br/>- (for the Query Group method) Chunk Scoring Text parameter<br/> |
| searchBySummarization | boolean. default: false | Set this to true to troubleshoot situations in which Sensible misidentifies the part of the document that contains the answers to your prompts. <br/>This parameter is compatible with documents up to 1,280 pages long.<br/>When true, Sensible uses a [completion-only retrieval-augmented generation (RAG) strategy](https://www.sensible.so/blog/embeddings-vs-completions-only-rag): Sensible prompts an LLM to summarize each page in the document, prompts a second LLM to return the pages most relevant to your prompt based on the summaries, and extracts the answers to your prompts from those pages. | If you set this parameter to true, then Sensible sets the following for chunk-related parameters and ignores any configured values:<br/><br/>- Chunk Size parameter: 1<br/>- Chunk Overlap Percentage parameter: 0<br/>- Chunk Count parameter: 5 <br/>- (for the Query Group method) Chunk Scoring Text parameter<br/> |
| pageHinting | boolean. default: true | Includes or or removes page metadata for each chunk from the full prompt Sensible inputs to an LLM.<br/>If set to true, then you can add location information to a prompt to narrow down the context's location. For example:<br/>**Location relative to page number and position on page**<br/>- "address in the top left of the first page of the document"<br/> - "What is the medical paid value on the last claim of the second page?"<br/>**Location relative to content in document**<br/>- "total amount in the expense table" <br/>- "phone number after section 2"<br/><br/>Set this to false if page numbers don't add useful information. For example, if your PDF converter automatically applied page numbers to scanned ID cards, set this parameter to false to ignore the page numbers, since their relationship to the cards' text is arbitrary.<br/> | |
| pageRange | object | Configures the possible page range for finding the context in the document.<br/>If specified, Sensible creates chunks in the page range and ignores other pages. For example, use this parameter to improve performance, or to avoid extracting unwanted data if your prompt has multiple candidate answers.<br/><br/>Contains the following parameters: <br/>`startPage`: Zero-based index of the page at which Sensible starts creating chunks (inclusive). <br/>`endPage`: Zero-based index of the page at which Sensible stops creating chunks (exclusive). | Sensible ignores this parameter when searching for a field's [anchor](doc:anchor). If you want to exclude the field's anchor using a page range, use the [Page Range](doc:page-range) preprocessor instead. |
| | | | |
Expand Down

0 comments on commit 618abe8

Please sign in to comment.