From b2968983ce2e357cc801f1b96844b07b202306b0 Mon Sep 17 00:00:00 2001 From: Frances Elliott Date: Fri, 28 Jun 2024 14:18:25 -0600 Subject: [PATCH] docs links --- changelog/january-2024.mdx | 2 +- drafts/draft-test.mdx | 8 ---- openapi.yml | 6 +-- openapi_classification.yml | 4 +- openapi_extraction.yml | 40 +++++++++---------- .../deprecated-invoice.mdx | 14 +++---- .../deprecated-key-value.mdx | 6 +-- .../deprecated-page-filter.mdx | 4 +- .../deprecated-features/deprecated-query.mdx | 26 ++++++------ .../deprecated-features/deprecated-table.mdx | 18 ++++----- .../deprecated-features/deprecated-tfidf.mdx | 4 +- .../deprecated-features/deprecated-topic.mdx | 16 ++++---- 12 files changed, 70 insertions(+), 78 deletions(-) delete mode 100644 drafts/draft-test.mdx diff --git a/changelog/january-2024.mdx b/changelog/january-2024.mdx index 74d5266..ad5edc9 100644 --- a/changelog/january-2024.mdx +++ b/changelog/january-2024.mdx @@ -48,7 +48,7 @@ Sensible uses LLMs to create the fields with the [Query method](/senseml-referen ## Improvement: Filter pages for LLM-based methods with the new Page Range parameter With the new Page Range parameter for [LLM-based](/llm-based-extractions/prompt-tips/index-prompt-tips) - methods, you can narrow down the document pages in which Sensible searches for the answer to your LLM prompt. Sensible ignores other pages. For example, use this parameter to improve performance, or to avoid extracting unwanted data if your prompt has multiple candidate answers. For more information, see the Page Range parameter documentation in the [Advanced prompt configuration](doc:prompt#https://docs.sensible.so/docs/prompt#global-sensible-instruct-parameters) + methods, you can narrow down the document pages in which Sensible searches for the answer to your LLM prompt. Sensible ignores other pages. For example, use this parameter to improve performance, or to avoid extracting unwanted data if your prompt has multiple candidate answers. For more information, see the Page Range parameter documentation in the [Advanced prompt configuration](/llm-based-extractions/prompt#https://docs.sensible.so/docs/prompt#global-sensible-instruct-parameters) topic. ## Improvement: Extract checkboxes inside tables diff --git a/drafts/draft-test.mdx b/drafts/draft-test.mdx deleted file mode 100644 index f05f62e..0000000 --- a/drafts/draft-test.mdx +++ /dev/null @@ -1,8 +0,0 @@ ---- ---- -title: "draft test" ---- - -don't show up in TOC or search? - -bad link \ No newline at end of file diff --git a/openapi.yml b/openapi.yml index ae8de51..4753597 100644 --- a/openapi.yml +++ b/openapi.yml @@ -848,13 +848,13 @@ components: - lazarus type: string description: | - For information about each OCR engine, see [OCR engine](doc:ocr-engine). + For information about each OCR engine, see [OCR engine](/senseml-reference/document-type-settings/ocr-engine). prevent_default_merge_lines: type: boolean description: | Prevents the built-in line merging that occurs before the - [Merge Lines](doc:merge-lines) preprocessor. + [Merge Lines](/senseml-reference/preprocessors/merge-lines) preprocessor. ocr_level: enum: @@ -864,7 +864,7 @@ components: - 5 type: number description: | - See [OCR level](doc:ocr-level). + See [OCR level](/senseml-reference/document-type-settings/ocr-level). validations: description: Array of validations. See https://docs.sensible.so/docs/validate-extractions items: diff --git a/openapi_classification.yml b/openapi_classification.yml index a48c790..7433543 100644 --- a/openapi_classification.yml +++ b/openapi_classification.yml @@ -38,7 +38,7 @@ paths: To post the document bytes, specify the non-encoded document bytes as the entire request body,and specify the `Content-Type` header, for example,"application/pdf" or "image/jpeg". - For supported file size and types, see [Supported file types](doc:file-types). + For supported file size and types, see [Supported file types](/senseml-reference/concepts/file-types). @@ -107,7 +107,7 @@ paths: To post the document bytes, specify the non-encoded document bytes as the entire request body,and specify the `Content-Type` header, for example,"application/pdf" or "image/jpeg". - For supported file size and types, see [Supported file types](doc:file-types). + For supported file size and types, see [Supported file types](/senseml-reference/concepts/file-types). requestBody: required: true diff --git a/openapi_extraction.yml b/openapi_extraction.yml index 5cb052d..910bf2a 100644 --- a/openapi_extraction.yml +++ b/openapi_extraction.yml @@ -42,8 +42,8 @@ paths: To explore this endpoint, use this interactive API reference, or use one of the following options: - - For a quick "hello world" response to this endpoint, see the [API quickstart](doc:quickstart) - - For a step-by-step tutorial about calling this endpoint, see [Try synchronous extraction](doc:api-tutorial-sync). + - For a quick "hello world" response to this endpoint, see the [API quickstart](/integrations/quickstart) + - For a step-by-step tutorial about calling this endpoint, see [Try synchronous extraction](/api-guides/api-tutorial/api-tutorial-sync). - Run this endpoint in the Sensible Postman collection. [Run in Postman](https://god.gw.postman.com/run-collection/16839934-45339059-3fec-4c31-a891-9a12a3e1c22b?action=collection%2Ffork&collection-url=entityId%3D16839934-45339059-3fec-4c31-a891-9a12a3e1c22b%26entityType%3Dcollection%26workspaceId%3Ddbde09dc-b7dd-487d-a68f-20d32b008f90) There are two options for posting the document bytes. @@ -51,7 +51,7 @@ paths: See the following for supported file formats. 2. Base64 encode the document bytes, specify them in a body "document" field, and specify application/json for the `Content-Type` header. - For a list of supported document file types, see [Supported file types](doc:file-types). + For a list of supported document file types, see [Supported file types](/senseml-reference/concepts/file-types). parameters: - $ref: '#/components/parameters/document_type' @@ -128,7 +128,7 @@ paths: 2. PUT your document at the `upload_url` returned from the previous step. Sensible extracts data from the document. 3. To retrieve the extraction, use a webhook, or use the extraction `id` returned in the response to poll the GET documents/{id} endpoint. - For supported file size and types, see [Supported file types](doc:file-types). + For supported file size and types, see [Supported file types](/senseml-reference/concepts/file-types). For example, if your call to `/generate_upload_url` specifies the document type with a `content_type` body parameter (recommended), your first two steps are as follows: @@ -193,12 +193,12 @@ paths: summary: Extract doc at your URL description: | Extract data asynchronously from a document at the specified `document_url`.
- For supported file size and types, see [Supported file types](doc:file-types). + For supported file size and types, see [Supported file types](/senseml-reference/concepts/file-types). Take the following steps. 1. Run this endpoint. 3. To retrieve the extraction, use a webhook, or use the extraction `id` returned in the response to poll the GET documents/{id} endpoint. For a step-by-step tutorial on calling this endpoint, - see [Try asynchronous extraction from your URL](doc:api-tutorial-async-1). + see [Try asynchronous extraction from your URL](/api-guides/api-tutorial/api-tutorial-async-1). parameters: - $ref: '#/components/parameters/document_type' - $ref: '#/components/parameters/environment' @@ -382,14 +382,14 @@ paths: summary: Extract portfolio at a Sensible URL description: | - Use this endpoint with multiple documents that are packaged into one file (a "portfolio"). For a list of supported file types, see [Supported file types](doc:file-types). + Use this endpoint with multiple documents that are packaged into one file (a "portfolio"). For a list of supported file types, see [Supported file types](/senseml-reference/concepts/file-types). Segments a portfolio file into the specified document types (for example, 1099, w2, and bank_statement) and then runs extractions asynchronously for each document Sensible finds in the portfolio. Take the following steps - 1. Use this endpoint to generate a Sensible URL. 2. PUT the document you want to extract data from at the URL, where `SENSIBLE_UPLOAD_URL` is the URL you received from this endpoint's response. For more information about how to PUT the document, see the [generate_upload_url/{document_type}](ref:generate-upload-url) endpoint. 3. To retrieve the extraction, use a webhook, or use the extraction `id` returned in the response to poll the GET documents/{id} endpoint. - For more about extracting from portfolios, see [Multi-document extractions](doc:portfolio). + For more about extracting from portfolios, see [Multi-document extractions](/layout-based-extractions/portfolio). parameters: - $ref: '#/components/parameters/environment' @@ -433,12 +433,12 @@ paths: summary: Extract portfolio at your URL description: | - Use this endpoint with multiple documents that are packaged into one file (a "portfolio"). For a list of supported file types, see [Supported file types](doc:file-types). + Use this endpoint with multiple documents that are packaged into one file (a "portfolio"). For a list of supported file types, see [Supported file types](/senseml-reference/concepts/file-types). Segments a portfolio file at the specified `document_url` into the specified document types (for example, 1099, w2, and bank_statement) and then runs extractions asynchronously for each document Sensible finds in the portfolio. Take the following steps. 1. Run this endpoint. 3. To retrieve the extraction, use a webhook, or use the extraction `id` returned in the response to poll the GET documents/{id} endpoint. - For more about extracting from portfolios, see [Multi-document extractions](doc:portfolio). + For more about extracting from portfolios, see [Multi-document extractions](/layout-based-extractions/portfolio). parameters: - $ref: '#/components/parameters/environment' - $ref: '#/components/parameters/document_name' @@ -602,11 +602,11 @@ paths: To compile multiple documents into one Excel file, specify the IDs of their recent extractions in the request separated by commas, for example, `/generate_excel/867514cc-fce7-40eb-8e9d-e6ec48cdac34,5093c65f-05bd-46a3-8df7-da3ed00f6d35`. For the best compiled spreadsheet results, configure your SenseML so that the documents output identically named fields. - For more information about the conversion process, see [SenseML to spreadsheet reference](doc:excel-reference). + For more information about the conversion process, see [SenseML to spreadsheet reference](/integrations/quick-extraction/excel-reference). - For portfolio extractions, Sensible returns an Excel file containing fields for all the documents it finds in the PDF. For more information, see [Multi-document spreadsheet](doc:excel-reference#multi-document-spreadsheet). + For portfolio extractions, Sensible returns an Excel file containing fields for all the documents it finds in the PDF. For more information, see [Multi-document spreadsheet](/integrations/quick-extraction/excel-reference#multi-document-spreadsheet). - For a list of document file types that Sensible can extract data from, see [Supported file types](doc:file-types). + For a list of document file types that Sensible can extract data from, see [Supported file types](/senseml-reference/concepts/file-types). Call this endpoint after an extraction completes. For more information about checking extraction status, see the `GET /documents/{id}` endpoint. parameters: @@ -647,8 +647,8 @@ paths: To compile multiple documents into one CSV file, specify the IDs of their recent extractions in the request separated by commas, for example, `/generate_csv/867514cc-fce7-40eb-8e9d-e6ec48cdac34,5093c65f-05bd-46a3-8df7-da3ed00f6d35`. For the best compiled spreadsheet results, configure your SenseML so that the documents output identically named fields. - For more information about the conversion process, see [SenseML to spreadsheet reference](doc:excel-reference). - For a list of document file types that Sensible can extract data from, see [Supported file types](doc:file-types). + For more information about the conversion process, see [SenseML to spreadsheet reference](/integrations/quick-extraction/excel-reference). + For a list of document file types that Sensible can extract data from, see [Supported file types](/senseml-reference/concepts/file-types). Call this endpoint after an extraction completes. For more information about checking extraction status, see the `GET /documents/{id}` endpoint. parameters: @@ -932,7 +932,7 @@ components: name: min_coverage in: query description: >- - Minimum extraction coverage score by which to filter the retrieved extractions. For more information about scoring, see [Monitoring extractions](doc:metrics). + Minimum extraction coverage score by which to filter the retrieved extractions. For more information about scoring, see [Monitoring extractions](/best-practices/metrics). schema: type: number example: 0.8 @@ -941,7 +941,7 @@ components: name: max_coverage in: query description: >- - Maximum extraction coverage score by which to filter the retrieved extractions. For more information about scoring, see [Monitoring extractions](doc:metrics). + Maximum extraction coverage score by which to filter the retrieved extractions. For more information about scoring, see [Monitoring extractions](/best-practices/metrics). schema: type: number example: 1.0 @@ -971,7 +971,7 @@ components: Coverage: type: number - description: The coverage score measures how fully an extraction captured all your target data in the document. It's a percentage comparing non-null, [validated](doc:validate-extractions) fields to total fields returned by a config for a document. For example, a coverage score of 70% for an extraction with no validation errors means that 30% of fields were null. For more information about scoring, see [Monitoring extractions](doc:metrics). + description: The coverage score measures how fully an extraction captured all your target data in the document. It's a percentage comparing non-null, [validated](/best-practices/validate-extractions) fields to total fields returned by a config for a document. For example, a coverage score of 70% for an extraction with no validation errors means that 30% of fields were null. For more information about scoring, see [Monitoring extractions](/best-practices/metrics). example: 0.75 Environment: type: string @@ -1052,7 +1052,7 @@ components: `[` denotes inclusive and `)` denotes exclusive. For example, when this endpoint returns `"coverage_histogram":[7,5,3,3,2,1,1,4,7,9,13,15]` , the first and last items in the array show that on specified date for the specified config, 7 extractions scored in the lowest bucket of 0-10%, and 15 scored in the highest bucket of 100%. - For more information about extraction coverage scores, see [Monitoring extractions](doc:metrics). + For more information about extraction coverage scores, see [Monitoring extractions](/best-practices/metrics). From the payload returned by this endpoint, you can calculate other metrics, for example: - total number of extractions in a time period - doc type and config usage @@ -1515,7 +1515,7 @@ components: - the bounding polygons that define line coordinates - for text that Sensible OCR'd, confidence scores. - For more information, see [Verbosity](doc:verbosity). + For more information, see [Verbosity](/senseml-reference/config-settings/verbosity). type: object example: policy_number: diff --git a/senseml-reference/deprecated-features/deprecated-invoice.mdx b/senseml-reference/deprecated-features/deprecated-invoice.mdx index cc54a06..05acf04 100644 --- a/senseml-reference/deprecated-features/deprecated-invoice.mdx +++ b/senseml-reference/deprecated-features/deprecated-invoice.mdx @@ -4,26 +4,26 @@ hidden: true --- ## Deprecated -This method is deprecated. [LLM-based methods](doc:instruct) replace this method. +This method is deprecated. [LLM-based methods](/llm-based-extractions/prompt-tips/index-instruct) replace this method. ## Description -This method is identical to the [(Deprecated) Table method](doc:deprecated-table), and also returns detected invoice metadata. This method accepts one invoice per document file. If the document contains multiple tables, the Invoice method returns the data for the table that is the best invoice candidate. +This method is identical to the [(Deprecated) Table method](/senseml-reference/deprecated-features/deprecated-table), and also returns detected invoice metadata. This method accepts one invoice per document file. If the document contains multiple tables, the Invoice method returns the data for the table that is the best invoice candidate. -It's a best practice to create a single, flexible config that works for a variety of invoice formats. This is because invoices typically come from such a wide variety of vendors that it would be unmanageable to create a config for each vendor. Create a flexible config by using synonymous terms to identify invoice elements. For more information, see the [Examples section](doc:deprecated-invoice#examples). +It's a best practice to create a single, flexible config that works for a variety of invoice formats. This is because invoices typically come from such a wide variety of vendors that it would be unmanageable to create a config for each vendor. Create a flexible config by using synonymous terms to identify invoice elements. For more information, see the [Examples section](/senseml-reference/deprecated-features/deprecated-invoice#examples). -[**Parameters**](doc:deprecated-invoice#parameters) -[**Examples**](doc:deprecated-invoice#examples) +[**Parameters**](/senseml-reference/deprecated-features/deprecated-invoice#parameters) +[**Examples**](/senseml-reference/deprecated-features/deprecated-invoice#examples) Parameters ==== -**Note:** For the full list of parameters available for this method, see [Global parameters for methods](doc:method#global-parameters-for-methods). The following table shows parameters most relevant to or specific to this method. +**Note:** For the full list of parameters available for this method, see [Global parameters for methods](/senseml-reference/field-query-object/method#global-parameters-for-methods). The following table shows parameters most relevant to or specific to this method. | key | value | description | | :------------------- | :-------- | :----------------------------------------------------------- | | id (**required**) | `invoice` | When you specify this method, you must also specify `"type": "table"` in the field's parameters. | -| columns **required** | array | An array of objects with the following parameters:
-`id` (**required**): The id for the column in the extraction output.
-`terms` (**required**): An array of terms to score positively during column recognition. For more information about scoring, see [bag of words](doc:bag-of-words) scoring. Usually, you include column heading terms in this array.
-`stopTerms`: An array of terms to score negatively during column recognition. For more information about scoring, see [bag of words](doc:bag-of-words).
-`type`: The table cell's type. For more information about types, see [Field query object](doc:field-query-object).
-`isRequired` (default false): If true, Sensible omits a row if its cell is empty in this column. If false, Sensible returns nulls for empty cells in the row. Note that if you set this parameter to true for one column, Sensible omits the row for *all* columns, even if the row had content under other columns. | +| columns **required** | array | An array of objects with the following parameters:
-`id` (**required**): The id for the column in the extraction output.
-`terms` (**required**): An array of terms to score positively during column recognition. For more information about scoring, see [bag of words](/senseml-reference/deprecated-features/deprecated-bag-of-words) scoring. Usually, you include column heading terms in this array.
-`stopTerms`: An array of terms to score negatively during column recognition. For more information about scoring, see [bag of words](/senseml-reference/deprecated-features/deprecated-bag-of-words).
-`type`: The table cell's type. For more information about types, see [Field query object](/senseml-reference/field-query-object/index-field-query-object).
-`isRequired` (default false): If true, Sensible omits a row if its cell is empty in this column. If false, Sensible returns nulls for empty cells in the row. Note that if you set this parameter to true for one column, Sensible omits the row for *all* columns, even if the row had content under other columns. | Examples diff --git a/senseml-reference/deprecated-features/deprecated-key-value.mdx b/senseml-reference/deprecated-features/deprecated-key-value.mdx index eb18f7a..71fa3e6 100644 --- a/senseml-reference/deprecated-features/deprecated-key-value.mdx +++ b/senseml-reference/deprecated-features/deprecated-key-value.mdx @@ -4,7 +4,7 @@ hidden: true --- ## Deprecated -This method is deprecated. [LLM-based methods](doc:instruct) replace this method. +This method is deprecated. [LLM-based methods](/llm-based-extractions/prompt-tips/index-instruct) replace this method. ## Decription @@ -19,10 +19,10 @@ Finds the most promising two-column tabular key/value pair in a single page of t "0-0": "id", "0-1": "`keyValue`", "1-0": "terms", - "1-2": "An array of terms to score positively. For more information about scoring, see [bag of words](doc:bag-of-words).", + "1-2": "An array of terms to score positively. For more information about scoring, see [bag of words](/senseml-reference/deprecated-features/deprecated-bag-of-words).", "1-1": "Array of strings", "2-0": "stopTerms", - "2-2": "optional. An array of terms to score negatively. For more information about scoring, see [bag of words](doc:bag-of-words).", + "2-2": "optional. An array of terms to score negatively. For more information about scoring, see [bag of words](/senseml-reference/deprecated-features/deprecated-bag-of-words).", "2-1": "Array of strings" }, "cols": 3, diff --git a/senseml-reference/deprecated-features/deprecated-page-filter.mdx b/senseml-reference/deprecated-features/deprecated-page-filter.mdx index f0dea1b..f5a171b 100644 --- a/senseml-reference/deprecated-features/deprecated-page-filter.mdx +++ b/senseml-reference/deprecated-features/deprecated-page-filter.mdx @@ -10,6 +10,6 @@ Filters out low-scoring pages given a bag of target terms and stop terms. | key | value | description | | -------------------- | ------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | type (**required**) | string | "pageFilter" | -| terms (**required**) | array | An array of terms to score positively (for example, `["number of buildings", "no. of buildings"]`). For more information about scoring, see [bag of words](doc:bag-of-words). | -| stopTerms | array | An array of terms to score negatively (for example, `["comparables"]`). For more information about scoring, see [bag of words](doc:bag-of-words). | +| terms (**required**) | array | An array of terms to score positively (for example, `["number of buildings", "no. of buildings"]`). For more information about scoring, see [bag of words](/senseml-reference/deprecated-features/deprecated-bag-of-words). | +| stopTerms | array | An array of terms to score negatively (for example, `["comparables"]`). For more information about scoring, see [bag of words](/senseml-reference/deprecated-features/deprecated-bag-of-words). | | maxPages | number | The maximum number of highest-scoring pages to pass through the filter | \ No newline at end of file diff --git a/senseml-reference/deprecated-features/deprecated-query.mdx b/senseml-reference/deprecated-features/deprecated-query.mdx index 3ce08ac..cbb99e7 100644 --- a/senseml-reference/deprecated-features/deprecated-query.mdx +++ b/senseml-reference/deprecated-features/deprecated-query.mdx @@ -6,17 +6,17 @@ hidden: true # Deprecated -The Query method is deprecated. The Query Group method replaces this method. See [Query Group](doc:query-group) for more information. +The Query method is deprecated. The Query Group method replaces this method. See [Query Group](/senseml-reference/llm-based-methods/query-group) for more information. ## Description This method extracts individual facts in a document, such as the date of an invoice, the liability limit of an insurance policy, or the destination address of a shipping container delivery. Sensible uses a large language model (LLM) to find these facts in paragraphs of free text, or in more structured layouts, for example key/value pairs or tables. -Sensible recommends framing each query, or prompt, so that it has a single, short answer. For prompts with multi-part answers, use the [List method](doc:list). +Sensible recommends framing each query, or prompt, so that it has a single, short answer. For prompts with multi-part answers, use the [List method](/senseml-reference/llm-based-methods/list). -**Note:** For the full list of parameters available for this method, see [Global parameters for methods](doc:method#section-global-parameters-for-methods). The following table only shows parameters most relevant to or specific to this method. +**Note:** For the full list of parameters available for this method, see [Global parameters for methods](/senseml-reference/field-query-object/method#section-global-parameters-for-methods). The following table only shows parameters most relevant to or specific to this method. -**Note** You can configure some the following parameters in both the [NLP](doc:nlp) preprocessor and in a field's method. If you configure both, the field's parameter overrrides the NLP preprocessor's parameter. For more information, see [Advanced prompt configuration](doc:prompt). +**Note** You can configure some the following parameters in both the [NLP](/senseml-reference/preprocessors/nlp) preprocessor and in a field's method. If you configure both, the field's parameter overrrides the NLP preprocessor's parameter. For more information, see [Advanced prompt configuration](/llm-based-extractions/prompt). Parameters ===== @@ -29,13 +29,13 @@ Parameters | description(**required**) | string | A free-text question about information in the document. For example, `"what's the policy period?"` or `"what's the client's first and last name?"`. | | chunkScoringText | string | Configures context's content. For details about context and chunks, see the Notes section.
A representative snippet of text from the part of the document where you expect to find the answer to your prompt. Use this parameter to narrow down the page location of the answer to your prompt. For example, if your prompt has multiple candidate answers, and the correct answer is located near unique or distinctive text that's difficult to incorporate into your question, then specify the distinctive text in this parameter.
If specified, Sensible uses this text to find top-scoring chunks. If unspecified, Sensible uses the prompt to score chunks.
Sensible recommends that the snippet is specific to the target chunk, semantically similar to the chunk, and structurally similar to the chunk.
For example, if the chunk contains a street address formatted with newlines, then provide a snippet with an example street address that contains newlines, like `123 Main Street\nLondon, England`. If the chunk contains a street address in a free-text paragraph, then provide an unformatted street address in the snippet.
For an example, see Example 3.
| | (**Deprecated**) promptIntroduction | string. | **(Deprecated)** overwrites the introductory text at the beginning of the [full prompt](https://docs.sensible.so/docs/prompt) that Sensible submits to the LLM for this field. | -| confidenceSignals | | For information about this parameter, see [Advanced prompt configuration](doc:prompt). | -| contextDescription | | For information about this parameter, see [Advanced prompt configuration](doc:prompt#parameters). | -| pageHinting | | For information about this parameter, see [Advanced prompt configuration](doc:prompt#parameters). | -| chunkCount | default: 5 | For information about this parameter, see [Advanced prompt configuration](doc:prompt#parameters). | -| chunkSize | default: 0.5 | For information about this parameter, see [Advanced prompt configuration](doc:prompt#parameters). | -| chunkOverlapPercentage | default: 0.5 | For information about this parameter, see [Advanced prompt configuration](doc:prompt#parameters). | -| pageRange | | For information about this parameter, see [Advanced prompt configuration](doc:prompt#parameters). | +| confidenceSignals | | For information about this parameter, see [Advanced prompt configuration](/llm-based-extractions/prompt). | +| contextDescription | | For information about this parameter, see [Advanced prompt configuration](/llm-based-extractions/prompt#parameters). | +| pageHinting | | For information about this parameter, see [Advanced prompt configuration](/llm-based-extractions/prompt#parameters). | +| chunkCount | default: 5 | For information about this parameter, see [Advanced prompt configuration](/llm-based-extractions/prompt#parameters). | +| chunkSize | default: 0.5 | For information about this parameter, see [Advanced prompt configuration](/llm-based-extractions/prompt#parameters). | +| chunkOverlapPercentage | default: 0.5 | For information about this parameter, see [Advanced prompt configuration](/llm-based-extractions/prompt#parameters). | +| pageRange | | For information about this parameter, see [Advanced prompt configuration](/llm-based-extractions/prompt#parameters). | Examples ==== @@ -262,7 +262,7 @@ For an overview of how this method works, see the following steps: - To meet the LLM's token limit for input, Sensible splits the document into equal-sized, overlapping chunks. - Sensible scores each chunk by its similarity to either the `description` or the `chunkScoringText` parameters. Sensible scores each chunk using the OpenAPI Embeddings API. - Sensible selects a number of the top-scoring chunks and combines them into "context". The chunks can be non-consecutive in the document. Sensible deduplicates overlapping text in consecutive chunks. If you set chunk-related parameters that cause the context to exceed the LLM's token limit, Sensible automatically reduces the chunk count until the context meets the token limit. -- Sensible creates a full prompt for the LLM (GPT-3.5 Turbo) that includes the chunks, page hinting data, and your prompt. For more information about the full prompt, see [Advanced prompt configuration](doc:prompt). +- Sensible creates a full prompt for the LLM (GPT-3.5 Turbo) that includes the chunks, page hinting data, and your prompt. For more information about the full prompt, see [Advanced prompt configuration](/llm-based-extractions/prompt). How location highlighting works --- @@ -285,7 +285,7 @@ For an overview of how Sensible finds the source text in the document for the LL Sensible can highlight the incorrect location under the following circumstances: -- If you prompt the LLM to reformat the source text in the document or reformat the text using a [type](doc:types) , then Sensible can fail to find a match or can find an inaccurate match. +- If you prompt the LLM to reformat the source text in the document or reformat the text using a [type](/senseml-reference/field-query-object/types) , then Sensible can fail to find a match or can find an inaccurate match. - If there are multiple candidates fuzzy matches in the document (for example, two instances of `April 7`), Sensible chooses the top-scoring match. If candidates have similar scores, Sensible uses page location as a tie breaker and chooses the earliest match in the document. diff --git a/senseml-reference/deprecated-features/deprecated-table.mdx b/senseml-reference/deprecated-features/deprecated-table.mdx index 4a8db32..37db7aa 100644 --- a/senseml-reference/deprecated-features/deprecated-table.mdx +++ b/senseml-reference/deprecated-features/deprecated-table.mdx @@ -4,7 +4,7 @@ hidden: true --- # Deprecated -This method is deprecated. To duplicate this method's function, use the [NLP Table ](doc:nlp-table)method and set the Rewrite Table parameter to false. +This method is deprecated. To duplicate this method's function, use the [NLP Table ](/senseml-reference/preprocessors/nlp-table)method and set the Rewrite Table parameter to false. ## Description @@ -12,24 +12,24 @@ Extracts tables based on bag-of-words scoring and returns their collated column Use the Table method for tables that have variable column formatting. -For alternatives to this method, see [Choosing a table method](doc:table-methods). +For alternatives to this method, see [Choosing a table method](/senseml-reference/concepts/table-methods). -[**Parameters**](doc:deprecated-table#parameters) -[**Examples**](doc:deprecated-table#examples) +[**Parameters**](/senseml-reference/deprecated-features/deprecated-table#parameters) +[**Examples**](/senseml-reference/deprecated-features/deprecated-table#examples) Parameters ===== -**Note:** For the full list of parameters available for this method, see [Global parameters for methods](doc:method#global-parameters-for-methods). The following table shows parameters most relevant to or specific to this method. +**Note:** For the full list of parameters available for this method, see [Global parameters for methods](/senseml-reference/field-query-object/method#global-parameters-for-methods). The following table shows parameters most relevant to or specific to this method. | key | value | description | | :----------------------- | :-------------------------------------------------- | :----------------------------------------------------------- | | id (**required**) | `table` | When you specify this method, you must also specify `"type": "table"` in the field's parameters. See the Stop parameter for details about how Sensible recognizes a table. | -| columns (**required**) | array | An array of objects with the following parameters:
-`id` (**required**): The id for the column in the extraction output.
-`terms` (**required**): An array of terms to score positively during column recognition. Usually, you include column heading terms in this array. For more information about scoring, see [bag of words](doc:bag-of-words).
-`stopTerms`: An array of terms to score negatively during column recognition. For more information about scoring, see [bag of words](doc:bag-of-words).
-`type`: The table cell's type. For more information, see [types](doc:types).
-`isRequired` (default false): If true, Sensible omits a row if its cell is empty in this column, or if the contents don't match the value you specify in this column's Type parameter. If false, Sensible returns nulls for empty cells in the row. Note that if you set this parameter to true for one column, Sensible omits the row for *all* columns, even if the row had content under other columns. | -| stop | [Match object](doc:match) or array of Match objects | (**Recommended**) Stops table recognition at the matched line. Otherwise, Sensible searches all pages for tables, which can impact performance.
When you specify a stop, Sensible uses an Amazon Web Service OCR provider to perform table recognition. When you omit a stop, Sensible uses a Microsoft OCR provider.
When you specify a stop, Sensible supports:
- merged cells in tables. Sensible populates "empty" spanned cells with the spanned value. For an example, see [Merged cell example](doc:fixed-table#example-merged-cells).
- checkboxes in cells. Returns checkbox selection status as `[true]` or `[false]`. | +| columns (**required**) | array | An array of objects with the following parameters:
-`id` (**required**): The id for the column in the extraction output.
-`terms` (**required**): An array of terms to score positively during column recognition. Usually, you include column heading terms in this array. For more information about scoring, see [bag of words](/senseml-reference/deprecated-features/deprecated-bag-of-words).
-`stopTerms`: An array of terms to score negatively during column recognition. For more information about scoring, see [bag of words](/senseml-reference/deprecated-features/deprecated-bag-of-words).
-`type`: The table cell's type. For more information, see [types](/senseml-reference/field-query-object/types).
-`isRequired` (default false): If true, Sensible omits a row if its cell is empty in this column, or if the contents don't match the value you specify in this column's Type parameter. If false, Sensible returns nulls for empty cells in the row. Note that if you set this parameter to true for one column, Sensible omits the row for *all* columns, even if the row had content under other columns. | +| stop | [Match object](/senseml-reference/field-query-object/match) or array of Match objects | (**Recommended**) Stops table recognition at the matched line. Otherwise, Sensible searches all pages for tables, which can impact performance.
When you specify a stop, Sensible uses an Amazon Web Service OCR provider to perform table recognition. When you omit a stop, Sensible uses a Microsoft OCR provider.
When you specify a stop, Sensible supports:
- merged cells in tables. Sensible populates "empty" spanned cells with the spanned value. For an example, see [Merged cell example](/senseml-reference/methods/fixed-table#example-merged-cells).
- checkboxes in cells. Returns checkbox selection status as `[true]` or `[false]`. | | startOnRow | integer. default: 0 | Zero-indexed row number at which to start table extraction. For example, use this to exclude column headings from the output. As a stricter alternative, set the Is Required parameter on a column and set a type on the column (see example in Examples section). | -| detectTableStructureOnly | boolean. default: false | Set this parameter to true to troubleshoot optional character recognition (OCR) in a table. If true, Sensible bypasses the text output by the table recognition OCR provider. Sensible instead recognizes the table's text using the [OCR engine](doc:ocr-engine) specified by your document type, or by using text embedded in the document file if present. | +| detectTableStructureOnly | boolean. default: false | Set this parameter to true to troubleshoot optional character recognition (OCR) in a table. If true, Sensible bypasses the text output by the table recognition OCR provider. Sensible instead recognizes the table's text using the [OCR engine](/senseml-reference/document-type-settings/ocr-engine) specified by your document type, or by using text embedded in the document file if present. | Examples ==== @@ -139,4 +139,4 @@ The following image shows the example document used with this example config: ![ Notes ==== -For alternatives to this method, see [Choosing a table method](doc:table-methods). +For alternatives to this method, see [Choosing a table method](/senseml-reference/concepts/table-methods). diff --git a/senseml-reference/deprecated-features/deprecated-tfidf.mdx b/senseml-reference/deprecated-features/deprecated-tfidf.mdx index 0a098ec..450d154 100644 --- a/senseml-reference/deprecated-features/deprecated-tfidf.mdx +++ b/senseml-reference/deprecated-features/deprecated-tfidf.mdx @@ -4,7 +4,7 @@ hidden: true --- ## Deprecated -This method is deprecated. [LLM-based methods](doc:instruct) replace this method. +This method is deprecated. [LLM-based methods](/llm-based-extractions/prompt-tips/index-instruct) replace this method. ## Description @@ -59,7 +59,7 @@ To produce this output, you specify classifications and corresponding example te Parameters ==== -The following parameters are in the computed field's [global Method](doc:computed-field-methods#parameters) parameter: +The following parameters are in the computed field's [global Method](/senseml-reference/computed-field-methods/index-computed-field-methods#parameters) parameter: | key | value | description | diff --git a/senseml-reference/deprecated-features/deprecated-topic.mdx b/senseml-reference/deprecated-features/deprecated-topic.mdx index d492929..0b9e62d 100644 --- a/senseml-reference/deprecated-features/deprecated-topic.mdx +++ b/senseml-reference/deprecated-features/deprecated-topic.mdx @@ -6,26 +6,26 @@ hidden: true ## Deprecated -This method is deprecated. [LLM-based methods](doc:instruct) replace this method. +This method is deprecated. [LLM-based methods](/llm-based-extractions/prompt-tips/index-instruct) replace this method. ## Description -Finds a range of lines in a document that best match a topic as determined by a [bag of words](doc:bag-of-words) scoring approach. Most useful in long, unstructured documents. For example, this method in conjunction with the [Summarizer method](doc:summarizer) can extract key-value pairs from free text using ML (machine learning). +Finds a range of lines in a document that best match a topic as determined by a [bag of words](/senseml-reference/deprecated-features/deprecated-bag-of-words) scoring approach. Most useful in long, unstructured documents. For example, this method in conjunction with the [Summarizer method](/senseml-reference/llm-based-methods/summarizer) can extract key-value pairs from free text using ML (machine learning). -[**Parameters**](doc:deprecated-topic#parameters) -[**Examples**](doc:deprecated-topic#examples) +[**Parameters**](/senseml-reference/deprecated-features/deprecated-topic#parameters) +[**Examples**](/senseml-reference/deprecated-features/deprecated-topic#examples) Parameters ===== -**Note:** For the full list of parameters available for this method, see [Global parameters for methods](doc:method#global-parameters-for-methods). The following table shows parameters most relevant to or specific to this method. +**Note:** For the full list of parameters available for this method, see [Global parameters for methods](/senseml-reference/field-query-object/method#global-parameters-for-methods). The following table shows parameters most relevant to or specific to this method. | key | value | description | | :---------------------------------------- | :----------- | :----------------------------------------------------------- | | id (**required**) | `topic` | The Anchor parameter is optional for fields that use this method. If you specify an anchor, Sensible searches from the anchor match to the end of the document for the topic. | -| numParagraphs or numLines (**required**) | number | The number of paragraphs or consecutive lines to extract, respectively.


If you set the Num Paragraphs parameter, Sensible scores every paragraph in the document and returns the highest-scoring paragraph. For more information about paragraph recognition, see the [Paragraph method](doc:paragraph) .

If you set the Num Lines parameter, Sensible scores every group of consecutive lines in the document and returns the highest-scoring group. For information about the definition of "consecutive", see [line sorting](doc:lines#line-sorting).

If line groups or paragraphs have equal scores, then Sensible returns the last one.
| -| terms (**required**) | string array | An array of terms to score positively during topic recognition. For more information about scoring, see [bag of words](doc:bag-of-words). | -| stopTerms | string array | An array of terms to score negatively during topic recognition. For more information about scoring, see [bag of words](doc:bag-of-words). | +| numParagraphs or numLines (**required**) | number | The number of paragraphs or consecutive lines to extract, respectively.


If you set the Num Paragraphs parameter, Sensible scores every paragraph in the document and returns the highest-scoring paragraph. For more information about paragraph recognition, see the [Paragraph method](/senseml-reference/methods/paragraph) .

If you set the Num Lines parameter, Sensible scores every group of consecutive lines in the document and returns the highest-scoring group. For information about the definition of "consecutive", see [line sorting](/senseml-reference/concepts/lines#line-sorting).

If line groups or paragraphs have equal scores, then Sensible returns the last one.
| +| terms (**required**) | string array | An array of terms to score positively during topic recognition. For more information about scoring, see [bag of words](/senseml-reference/deprecated-features/deprecated-bag-of-words). | +| stopTerms | string array | An array of terms to score negatively during topic recognition. For more information about scoring, see [bag of words](/senseml-reference/deprecated-features/deprecated-bag-of-words). | Examples ====