Skip to content

Commit

Permalink
broken links
Browse files Browse the repository at this point in the history
  • Loading branch information
fscelliott committed Nov 22, 2024
1 parent 51aff40 commit fe6567b
Show file tree
Hide file tree
Showing 6 changed files with 9 additions and 9 deletions.
4 changes: 2 additions & 2 deletions readme-sync/v0/data-extraction/1000 - getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -199,7 +199,7 @@ This guide focuses on layout-based document extraction, which works as follows:

- Sensible searches first for a text "anchor" because it's a computationally quick way to narrow down the location of the target data to extract. An anchor is text that always occurs close to your target text. Without it, Sensible wouldn't know which page to search in for your target text. For more information about defining complex anchors, see [Anchor](doc:anchor).

- Then, Sensible uses a "method" to expand its search out from the anchor and extract the data you want. For more information about methods, see [Layout-based methods](doc:methods).
- Then, Sensible uses a "method" to expand its search out from the anchor and extract the data you want. For more information about methods, see [Layout-based methods](doc:layout-based-methods).

This config uses three types of layout-based methods:

Expand Down Expand Up @@ -443,7 +443,7 @@ You can get more advanced with this auto insurance config. For example:
- What if the document listed emails, and you wanted to capture all those emails? You could use a regular expression (regex) in a `"match":"all"` anchor coupled with a [Passthrough method](doc:passthrough), or the [Regex method](doc:regex).
- You can split the policy period into two dates, either by using the [Split computed field method](doc:split), or by setting the [Date](doc:types#date) type on the field and using a tiebreaker.

To check out other methods, see [Layout-based methods](doc:methods).
To check out other methods, see [Layout-based methods](doc:layout-based-methods).

Test the config
====
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ See the following topics for reference documentation for the SenseML query langu

- [Field query object](doc:field-query-object)
- [Preprocessors](doc:preprocessors)
- [LLM-based methods](doc:llm-based-methods) and [layout-based methods](doc:methods). For more information about choosing whether to author layout- or LLM-based methods, see [Choosing an extraction approach](doc:author).
- [LLM-based methods](doc:llm-based-methods) and [layout-based methods](doc:layout-based-methods). For more information about choosing whether to author layout- or LLM-based methods, see [Choosing an extraction approach](doc:author).
- [Configuration settings](doc:config-settings)
- [Computed Field methods](doc:computed-field-methods)
- [Sections](doc:sections)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ hidden: false

A Method object defines how to extract target data. There are two broad categories of methods:

| | [LLM-based methods](doc:llm-based-methods) | [Layout-based methods](doc:methods) |
| | [LLM-based methods](doc:llm-based-methods) | [Layout-based methods](doc:layout-based-methods) |
| ----------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ |
| Notes | Ask questions about info in the document, as you'd ask a human. For example, "what's the policy period"? Uses large language models (LLMs). | Find the information in the document using anchoring text and layout data. For example, specify to extract the second cell in a column headed by "premium". |
| Deterministic | no | yes |
Expand All @@ -21,7 +21,7 @@ The following global parameters are available to all methods:

| Key | Value | Description |
| --------------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ |
| `id` | string | see [Layout-based methods](doc:methods) and [LLM-based methods](doc:llm-based-methods). |
| `id` | string | see [Layout-based methods](doc:layout-based-methods) and [LLM-based methods](doc:llm-based-methods). |
| tiebreaker | integer (zero-based index)<br/> or<br/>ordinal (`first`, `second`, `third`, `last`)<br/>or <br/> comparison (`>`, `<`)<br/>or<br/>`join`<br/> Default: `join` | If the method returns multiple elements (for example, a Row method), specifies which element to extract in the returned array. <br/><br/>**integer**: Returns the zero-indexed nth element in the returned lines array, using Sensible's [default line sorting](doc:lines#line-sorting). For example, 0 returns the first line, -1 returns the last line, and -2 returns the second-to-last line in the array.<br/><br/>**ordinal:** Returns the `first`, `second`,`third` or `last` element, using Sensible's [default line sorting](doc:lines#line-sorting).<br/><br/>**comparison:** Returns the first or last element, sorted [alphanumerically](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators#relational_operators) using Unicode values.<br/> If you want to compare numeric amounts and ignore non-numbers, then add a numeric [type](doc:types) such as `type: currency` as a top-level parameter to the field.<br/><br/>**join**: Returns all elements in the returned array as a single string, delimited by whitespaces. |
| lineFilters | Match object | Filters out the specified lines from the method's output. For example, if the Box method extracts unwanted footer lines from a box, you can filter out the lines with this parameter. |
| typeFilters | array of [Types](doc:types) | Filters out the specified types from the method results. For example, for a target box containing a delivery date, a street address, and delivery notes, you can filter out the lines containing Date and Address types in order to extract the delivery notes. Note that less strict types, such as Name and Currency types, are less useful in this filter than stricter types such as the Phone Number type.<br/>For an example, see the Examples section. |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ The Field query object has the following top-level parameters:
| --------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
| id (**required**) | string | Sensible uses the ID as the key in the structured key/value output. In the API response, this output is in the `parsed_document` object.<br/>If a field fails and returns null, you can specify a backup, or fallback field to target the same data with a different method. To specify fallbacks between fields, specify consecutive fields that use the same ID.<br/>For example, to capture differences in wording between document revisions, define two fields with the same ID, which anchor on synonymous text that 's present or absent in different document revisions. For more examples, see [Fallback strategies](doc:fallbacks). <br/>Fallback fields can be of any kind. For example, you can fallback from a field, to a computed field, to a section group.<br/>**Limitations:**<br/>- Fallbacks don't work across nested structures. For example, you can't fall back from a parent section group's field to a child section group's field.<br/>- Fallbacks don't work within a Query Group method. To specify fallbacks, define them in separate query groups. | Same |
| anchor | string, [Match object](doc:match), or array of Match objects | **Required**<br/> Matched text that narrows down the location of the target data to extract. <br/> For more information, see [Anchor object](doc:anchor). | **Optional** <br/>If the matched text is present anywhere in the document, Sensible runs the method on the whole document, otherwise it returns `null`. For more information, see [Anchor object](doc:anchor). |
| method (**required**) | object | Defines how to spatially expand out from the anchor and extract the target data. Use for documents that have a relatively consistent spatial layout. For example, 1040 forms have relatively consistent layout. For more information, see [Layout-based methods](doc:methods). | Describes the contents of the target data to extract in natural-language prompts for an LLM. Use for documents that have a relatively inconsistent spatial layout, for example, legal contracts. For more information, see [LLM-based methods](doc:llm-based-methods). |
| method (**required**) | object | Defines how to spatially expand out from the anchor and extract the target data. Use for documents that have a relatively consistent spatial layout. For example, 1040 forms have relatively consistent layout. For more information, see [Layout-based methods](doc:layout-based-methods). | Describes the contents of the target data to extract in natural-language prompts for an LLM. Use for documents that have a relatively inconsistent spatial layout, for example, legal contracts. For more information, see [LLM-based methods](doc:llm-based-methods). |
| type | see [Types](doc:types) | The data type to extract, for example, a currency, an address, or a custom type you define. This structured output includes the type information. If the field captures other data in addition to the data matching the type, Sensible suppresses the additional data from the output. For more information, see [Types](doc:types). | same |
| match | `first`,`last`,`all`, `allWithNull`,`mostFrequent` | If there are multiple anchors, specifies which one to use to extract output for layout-based methods. <br/> <br/>- `first` specifies the first anchor in the document that returns non-null output.<br/><br/>- `last` specifies the last anchor in the document that returns non-null output.<br/><br/>- `all` matches all anchors and returns non-null extracted output under a single key. For example, something like: <br/>{<br/> "name_of_output_key": [<br/> {<br/> "type": "string",<br/> "value": "extracted data for first anchor match"<br/> },<br/> {<br/> "type": "string",<br/> "value": "extracted data for second anchor match"<br/> } ]<br/>}<br/><br/><br/>- `allWithNull` matches all anchors and returns extracted output, including null output, under a single key. For example, use this option if you're using the [Zip computed field method](doc:zip) to zip together parallel arrays, where array elements can be nulls. For an example, see [Zip](doc:zip#match-with-null-zip).<br/><br/>- `mostFrequent` matches all anchors, extracts the corresponding output, then returns the most frequently occurring non-null output. This is useful for OCR text, like poor-quality scans or photographs. For example, a scanned document repeats a box titled `1 Wages` four times with the same dollar value, `21850.20`. Due to OCR errors, the extracted outputs are `21050.20`, `21850.20`, `21850.20` and `21850.58`. This option returns the most frequent, and therefore the mostly likely correct output, `21850.20`. | not applicable |

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ hidden: false

Extract free text from unstructured documents using large language model (LLM)-based SenseML methods. For example, extract information from legal paragraphs in contracts and leases, or results from research papers.

The following LLM-based methods are alternatives to [layout-based methods](doc:methods) for structured documents, for example, tax documents or insurance forms.
The following LLM-based methods are alternatives to [layout-based methods](doc:layout-based-methods) for structured documents, for example, tax documents or insurance forms.

| Method | Example use case | Notes |
| ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
Expand All @@ -17,4 +17,4 @@ The following LLM-based methods are alternatives to [layout-based methods](doc:m
Notes
====

For layout-based extraction, see [Layout-based methods](doc:methods).
For layout-based extraction, see [Layout-based methods](doc:layout-based-methods).
2 changes: 1 addition & 1 deletion readme-sync/v0/welcome/1000 - overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ Configure your extractions using _SenseML_, Sensible's document-specific query l
With SenseML, you can:

- Preprocess documents by correcting layout metadata problems, removing unwanted pages, and more, so that Sensible has a clean, standardized text representation of the document from which to extract structured data in a later step. For more information, see [Preprocessors](doc:preprocessors).
- Use "methods" to extract document primitives, like rows, columns, tables, boxes, checkbox status, and more. You can also parse extracted data types like currencies, dates, addresses, or your custom types. For more information, see [Layout-based methods](doc:methods).
- Use "methods" to extract document primitives, like rows, columns, tables, boxes, checkbox status, and more. You can also parse extracted data types like currencies, dates, addresses, or your custom types. For more information, see [Layout-based methods](doc:layout-based-methods).
- Post-process extracted document data. For example:
- Write logical [validations](doc:validate-extractions) like `customer ID is 9 digits` to throw custom errors and warnings about your extracted data.
- Manipulate the extracted data schema with [computed methods](doc:computed-field-methods) like concat, split, and [custom logic](doc:custom-computation). Or, completely transform Sensible's standardized `parsed_document` output into any schema to fit your data consupmtion needs using a [JsonLogic postprocessor](doc:postprocessor).
Expand Down

0 comments on commit fe6567b

Please sign in to comment.