Skip to content

Commit

Permalink
docs links
Browse files Browse the repository at this point in the history
  • Loading branch information
fscelliott committed Jun 28, 2024
1 parent 3782854 commit b296898
Show file tree
Hide file tree
Showing 12 changed files with 70 additions and 78 deletions.
2 changes: 1 addition & 1 deletion changelog/january-2024.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ Sensible uses LLMs to create the fields with the [Query method](/senseml-referen
## Improvement: Filter pages for LLM-based methods with the new Page Range parameter

With the new Page Range parameter for [LLM-based](/llm-based-extractions/prompt-tips/index-prompt-tips)
methods, you can narrow down the document pages in which Sensible searches for the answer to your LLM prompt. Sensible ignores other pages. For example, use this parameter to improve performance, or to avoid extracting unwanted data if your prompt has multiple candidate answers. For more information, see the Page Range parameter documentation in the [Advanced prompt configuration](doc:prompt#https://docs.sensible.so/docs/prompt#global-sensible-instruct-parameters)
methods, you can narrow down the document pages in which Sensible searches for the answer to your LLM prompt. Sensible ignores other pages. For example, use this parameter to improve performance, or to avoid extracting unwanted data if your prompt has multiple candidate answers. For more information, see the Page Range parameter documentation in the [Advanced prompt configuration](/llm-based-extractions/prompt#https://docs.sensible.so/docs/prompt#global-sensible-instruct-parameters)
topic.

## Improvement: Extract checkboxes inside tables
Expand Down
8 changes: 0 additions & 8 deletions drafts/draft-test.mdx

This file was deleted.

6 changes: 3 additions & 3 deletions openapi.yml
Original file line number Diff line number Diff line change
Expand Up @@ -848,13 +848,13 @@ components:
- lazarus
type: string
description: |
For information about each OCR engine, see [OCR engine](doc:ocr-engine).
For information about each OCR engine, see [OCR engine](/senseml-reference/document-type-settings/ocr-engine).
prevent_default_merge_lines:
type: boolean
description: |
Prevents the built-in line merging that occurs before the
[Merge Lines](doc:merge-lines) preprocessor.
[Merge Lines](/senseml-reference/preprocessors/merge-lines) preprocessor.
ocr_level:
enum:
Expand All @@ -864,7 +864,7 @@ components:
- 5
type: number
description: |
See [OCR level](doc:ocr-level).
See [OCR level](/senseml-reference/document-type-settings/ocr-level).
validations:
description: Array of validations. See https://docs.sensible.so/docs/validate-extractions
items:
Expand Down
4 changes: 2 additions & 2 deletions openapi_classification.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ paths:
To post the document bytes, specify the non-encoded document bytes as the entire request body,and specify the `Content-Type` header, for example,"application/pdf" or "image/jpeg".
For supported file size and types, see [Supported file types](doc:file-types).
For supported file size and types, see [Supported file types](/senseml-reference/concepts/file-types).
Expand Down Expand Up @@ -107,7 +107,7 @@ paths:
To post the document bytes, specify the non-encoded document bytes as the entire request body,and specify the `Content-Type` header, for example,"application/pdf" or "image/jpeg".
For supported file size and types, see [Supported file types](doc:file-types).
For supported file size and types, see [Supported file types](/senseml-reference/concepts/file-types).
requestBody:
required: true

Expand Down
40 changes: 20 additions & 20 deletions openapi_extraction.yml
Original file line number Diff line number Diff line change
Expand Up @@ -42,16 +42,16 @@ paths:
To explore this endpoint, use this interactive API reference, or use one of the following options:
- For a quick "hello world" response to this endpoint, see the [API quickstart](doc:quickstart)
- For a step-by-step tutorial about calling this endpoint, see [Try synchronous extraction](doc:api-tutorial-sync).
- For a quick "hello world" response to this endpoint, see the [API quickstart](/integrations/quickstart)
- For a step-by-step tutorial about calling this endpoint, see [Try synchronous extraction](/api-guides/api-tutorial/api-tutorial-sync).
- Run this endpoint in the Sensible Postman collection. [Run in Postman](https://god.gw.postman.com/run-collection/16839934-45339059-3fec-4c31-a891-9a12a3e1c22b?action=collection%2Ffork&collection-url=entityId%3D16839934-45339059-3fec-4c31-a891-9a12a3e1c22b%26entityType%3Dcollection%26workspaceId%3Ddbde09dc-b7dd-487d-a68f-20d32b008f90)
There are two options for posting the document bytes.
1. (often preferred) specify the non-encoded document bytes as the entire request body,and specify the `Content-Type` header, for example,"application/pdf" or "image/jpeg".
See the following for supported file formats.
2. Base64 encode the document bytes, specify them in a body "document" field, and specify application/json for the `Content-Type` header.
For a list of supported document file types, see [Supported file types](doc:file-types).
For a list of supported document file types, see [Supported file types](/senseml-reference/concepts/file-types).
parameters:
- $ref: '#/components/parameters/document_type'
Expand Down Expand Up @@ -128,7 +128,7 @@ paths:
2. PUT your document at the `upload_url` returned from the previous step. Sensible extracts data from the document.
3. To retrieve the extraction, use a webhook, or use the extraction `id` returned in the response to poll the GET documents/{id} endpoint.
For supported file size and types, see [Supported file types](doc:file-types).
For supported file size and types, see [Supported file types](/senseml-reference/concepts/file-types).
For example, if your call to `/generate_upload_url` specifies the document type with a `content_type` body parameter (recommended), your first two steps are as follows:
Expand Down Expand Up @@ -193,12 +193,12 @@ paths:
summary: Extract doc at your URL
description: |
Extract data asynchronously from a document at the specified `document_url`.<br/>
For supported file size and types, see [Supported file types](doc:file-types).
For supported file size and types, see [Supported file types](/senseml-reference/concepts/file-types).
Take the following steps.
1. Run this endpoint.
3. To retrieve the extraction, use a webhook, or use the extraction `id` returned in the response to poll the GET documents/{id} endpoint.
For a step-by-step tutorial on calling this endpoint,
see [Try asynchronous extraction from your URL](doc:api-tutorial-async-1).
see [Try asynchronous extraction from your URL](/api-guides/api-tutorial/api-tutorial-async-1).
parameters:
- $ref: '#/components/parameters/document_type'
- $ref: '#/components/parameters/environment'
Expand Down Expand Up @@ -382,14 +382,14 @@ paths:
summary: Extract portfolio at a Sensible URL

description: |
Use this endpoint with multiple documents that are packaged into one file (a "portfolio"). For a list of supported file types, see [Supported file types](doc:file-types).
Use this endpoint with multiple documents that are packaged into one file (a "portfolio"). For a list of supported file types, see [Supported file types](/senseml-reference/concepts/file-types).
Segments a portfolio file into the specified document types (for example, 1099, w2, and bank_statement) and then runs extractions
asynchronously for each document Sensible finds in the portfolio. Take the following steps -
1. Use this endpoint to generate a Sensible URL.
2. PUT the document you want to extract data from at the URL, where `SENSIBLE_UPLOAD_URL` is the URL you received
from this endpoint's response. For more information about how to PUT the document, see the [generate_upload_url/{document_type}](ref:generate-upload-url) endpoint.
3. To retrieve the extraction, use a webhook, or use the extraction `id` returned in the response to poll the GET documents/{id} endpoint.
For more about extracting from portfolios, see [Multi-document extractions](doc:portfolio).
For more about extracting from portfolios, see [Multi-document extractions](/layout-based-extractions/portfolio).
parameters:
- $ref: '#/components/parameters/environment'
Expand Down Expand Up @@ -433,12 +433,12 @@ paths:
summary: Extract portfolio at your URL
description: |
Use this endpoint with multiple documents that are packaged into one file (a "portfolio"). For a list of supported file types, see [Supported file types](doc:file-types).
Use this endpoint with multiple documents that are packaged into one file (a "portfolio"). For a list of supported file types, see [Supported file types](/senseml-reference/concepts/file-types).
Segments a portfolio file at the specified `document_url` into the specified document types (for example, 1099, w2, and bank_statement)
and then runs extractions asynchronously for each document Sensible finds in the portfolio. Take the following steps.
1. Run this endpoint.
3. To retrieve the extraction, use a webhook, or use the extraction `id` returned in the response to poll the GET documents/{id} endpoint.
For more about extracting from portfolios, see [Multi-document extractions](doc:portfolio).
For more about extracting from portfolios, see [Multi-document extractions](/layout-based-extractions/portfolio).
parameters:
- $ref: '#/components/parameters/environment'
- $ref: '#/components/parameters/document_name'
Expand Down Expand Up @@ -602,11 +602,11 @@ paths:
To compile multiple documents into one Excel file, specify the IDs of their recent extractions in the request separated by commas, for example,
`/generate_excel/867514cc-fce7-40eb-8e9d-e6ec48cdac34,5093c65f-05bd-46a3-8df7-da3ed00f6d35`.
For the best compiled spreadsheet results, configure your SenseML so that the documents output identically named fields.
For more information about the conversion process, see [SenseML to spreadsheet reference](doc:excel-reference).
For more information about the conversion process, see [SenseML to spreadsheet reference](/integrations/quick-extraction/excel-reference).
For portfolio extractions, Sensible returns an Excel file containing fields for all the documents it finds in the PDF. For more information, see [Multi-document spreadsheet](doc:excel-reference#multi-document-spreadsheet).
For portfolio extractions, Sensible returns an Excel file containing fields for all the documents it finds in the PDF. For more information, see [Multi-document spreadsheet](/integrations/quick-extraction/excel-reference#multi-document-spreadsheet).
For a list of document file types that Sensible can extract data from, see [Supported file types](doc:file-types).
For a list of document file types that Sensible can extract data from, see [Supported file types](/senseml-reference/concepts/file-types).
Call this endpoint after an extraction completes. For more information about checking extraction status,
see the `GET /documents/{id}` endpoint.
parameters:
Expand Down Expand Up @@ -647,8 +647,8 @@ paths:
To compile multiple documents into one CSV file, specify the IDs of their recent extractions in the request separated by commas, for example,
`/generate_csv/867514cc-fce7-40eb-8e9d-e6ec48cdac34,5093c65f-05bd-46a3-8df7-da3ed00f6d35`.
For the best compiled spreadsheet results, configure your SenseML so that the documents output identically named fields.
For more information about the conversion process, see [SenseML to spreadsheet reference](doc:excel-reference).
For a list of document file types that Sensible can extract data from, see [Supported file types](doc:file-types).
For more information about the conversion process, see [SenseML to spreadsheet reference](/integrations/quick-extraction/excel-reference).
For a list of document file types that Sensible can extract data from, see [Supported file types](/senseml-reference/concepts/file-types).
Call this endpoint after an extraction completes. For more information about checking extraction status,
see the `GET /documents/{id}` endpoint.
parameters:
Expand Down Expand Up @@ -932,7 +932,7 @@ components:
name: min_coverage
in: query
description: >-
Minimum extraction coverage score by which to filter the retrieved extractions. For more information about scoring, see [Monitoring extractions](doc:metrics).
Minimum extraction coverage score by which to filter the retrieved extractions. For more information about scoring, see [Monitoring extractions](/best-practices/metrics).
schema:
type: number
example: 0.8
Expand All @@ -941,7 +941,7 @@ components:
name: max_coverage
in: query
description: >-
Maximum extraction coverage score by which to filter the retrieved extractions. For more information about scoring, see [Monitoring extractions](doc:metrics).
Maximum extraction coverage score by which to filter the retrieved extractions. For more information about scoring, see [Monitoring extractions](/best-practices/metrics).
schema:
type: number
example: 1.0
Expand Down Expand Up @@ -971,7 +971,7 @@ components:

Coverage:
type: number
description: The coverage score measures how fully an extraction captured all your target data in the document. It's a percentage comparing non-null, [validated](doc:validate-extractions) fields to total fields returned by a config for a document. For example, a coverage score of 70% for an extraction with no validation errors means that 30% of fields were null. For more information about scoring, see [Monitoring extractions](doc:metrics).
description: The coverage score measures how fully an extraction captured all your target data in the document. It's a percentage comparing non-null, [validated](/best-practices/validate-extractions) fields to total fields returned by a config for a document. For example, a coverage score of 70% for an extraction with no validation errors means that 30% of fields were null. For more information about scoring, see [Monitoring extractions](/best-practices/metrics).
example: 0.75
Environment:
type: string
Expand Down Expand Up @@ -1052,7 +1052,7 @@ components:
`[` denotes inclusive and `)` denotes exclusive.
For example, when this endpoint returns `"coverage_histogram":[7,5,3,3,2,1,1,4,7,9,13,15]` , the first and last items in the array show that on specified date for the specified config, 7 extractions scored in the lowest bucket of 0-10%, and 15 scored in the highest bucket of 100%.
For more information about extraction coverage scores, see [Monitoring extractions](doc:metrics).
For more information about extraction coverage scores, see [Monitoring extractions](/best-practices/metrics).
From the payload returned by this endpoint, you can calculate other metrics, for example:
- total number of extractions in a time period
- doc type and config usage
Expand Down Expand Up @@ -1515,7 +1515,7 @@ components:
- the bounding polygons that
define line coordinates
- for text that Sensible OCR'd, confidence scores.
For more information, see [Verbosity](doc:verbosity).
For more information, see [Verbosity](/senseml-reference/config-settings/verbosity).
type: object
example:
policy_number:
Expand Down
Loading

0 comments on commit b296898

Please sign in to comment.