Skip to content

Commit

Permalink
Wording edits
Browse files Browse the repository at this point in the history
  • Loading branch information
mesellings committed Dec 17, 2024
1 parent 1da6fb9 commit 89d2aa8
Show file tree
Hide file tree
Showing 5 changed files with 40 additions and 29 deletions.
14 changes: 8 additions & 6 deletions docs/components/modeler/web-modeler/idp/idp-applications.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,6 @@ Create and manage your IDP projects in an **IDP application** folder.

<img src={IdpApplicationImg} alt="IDP application screen" />

- IDP applications and projects are only fully operational when linked to a healthy, active cluster. You can select an unstable or unhealthy cluster when first creating an IDP application, and change to a stable cluster when required.
- You can only select a cluster that supports Camunda document handling (8.7+).
- You cannot change the selected cluster of an IDP application after it has been created.
- Camunda recommends using a development cluster as best practice.

## Create an IDP application

To create an IDP application:
Expand All @@ -28,4 +23,11 @@ To create an IDP application:
:::

1. Click **Create** to create the IDP application.
1. You can now add and create [document extraction](idp-document-extraction.md) and [document automation](idp-document-automation.md) projects in your IDP application.
1. You can now create [document extraction](idp-document-extraction.md) and [document automation](idp-document-automation.md) projects in your IDP application.

## IDP application clusters

- IDP applications and projects are only fully operational when linked to a healthy, active cluster. You can select an unstable or unhealthy cluster when first creating an IDP application, and change to a stable cluster when required.
- You can only select a cluster that supports Camunda document handling (8.7+).
- You cannot change the cluster of an IDP application after it has been created.
- Camunda recommends using a development cluster for your IDP applications.
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ title: Document automation
description: "You can import a BPMN or DMN diagram at any time with Web Modeler."
---

Create a document automation project to automatically classify and extract data based on linked document extraction models.
Extract data from complex documents based on one or more linked [document extraction](idp-document-extraction.md) projects.

## About document automation

Expand All @@ -13,7 +13,7 @@ Document automation allows you to automatically extract data from complex PDF do
For example, if you want to process large multi-page PDFs containing multiple document types (invoices, reports, forms), you can create a document automation project to extract the specific data you want.

- You must link at least one [document extraction](idp-document-extraction.md) project so the LLM can accurately analyze, classify, and extract document data.
- You can choose the LLM you want to use, allowing you to test different models until you find the one that best suits your budget and accuracy requirements.
- Choose and test different LLM models to find the one that best suits your budget and accuracy requirements.
- Document classification involves automatically categorizing documents into predefined classes/types, based on their content.

## Create document automation project
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,15 @@ description: "You can import a BPMN or DMN diagram at any time with Web Modeler.

import IdpExtractionProjectModalImg from './img/idp-create-extraction-project-modal.png';

Create a document extraction project to identify and extract specific data from your structured and unstructured documents.
Extract data from a single type of structured or unstructured document.

## About document extraction

Document extraction projects form the basis for using IDP in your end-to-end processes.

- Create a separate document extraction project for each type of document you want to categorize and extract data from, such as an invoice, a report, identity document, and so on.
- Once published, extraction projects can be added to your processes, or linked to a [document automation](idp-document-automation.md) project.

- Once published, extraction projects can be [integrated into your processes](idp-integrate.md) or linked to a [document automation](idp-document-automation.md) project.

## Create document extraction project

Expand Down
18 changes: 12 additions & 6 deletions docs/components/modeler/web-modeler/idp/idp-reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,16 +6,22 @@ description: "You can import a BPMN or DMN diagram at any time with Web Modeler.

The following technical reference information is provided for IDP.

## Supported document types
## Document file formats

IDP currently supports processing and data extraction for the following types of document.
IDP currently only supports data extraction from the following document file formats.

| Document Type | Description |
| :------------ | :----------------------------------------------------------------------------------------------------------------------- |
| PDF | <p><ul><li><p>PDF documents must not be password protected.</p></li><li>Maximum document file size is 4MB.</li></ul></p> |
| File format | Description |
| :---------- | :---------------------------------------------------------------------------------------------------------------- |
| PDF | <p><ul><li>PDF documents must not be password protected.</li><li>Maximum document file size is 4MB.</li></ul></p> |

## Document storage

Uploaded documents are stored in Web Modeler, not your cluster.
For SaaS, uploaded documents are stored in Web Modeler itself, not your cluster.

## Extraction field types

You can use any of the following field types when creating an extraction field.

| Field type | Description |
| :--------- | :---------- |
| Number | ... |
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,8 @@ Complete the following steps to configure and publish an unstructured data extra

1. [Upload documents](#step-1-upload-documents): Upload a set of sample documents to use for training the extraction model.
1. [Extract fields](#step-2-extract-fields): Add and configure the [extraction fields](idp-key-concepts.md#extraction-fields) you want to use to extract data.
1. [Validate extraction](#step-3-validate-extraction): Test the data extraction using your uploaded document(s) and evaluate the extraction results.
1. [Publish](#step-4-publish): Publish the project to make it available for use in your processes and [document automation](idp-document-automation.md) projects.
1. [Validate extraction](#step-3-validate-extraction): Test and evaluate the data extraction results using your uploaded documents.
1. [Publish](#step-4-publish): Publish the project to make it available for use [in your processes](idp-integrate.md) and [document automation](idp-document-automation.md) projects.

<!-- Configure and publish your project on the **Unstructured data extraction** screen.
Expand All @@ -35,7 +35,7 @@ Use the tabs to navigate between configuration steps at any time.

## Step 1: Upload documents

Start by uploading a set of sample PDF documents that represent the specific document type you want to extract data from. You will use these documents to test your data extraction in the next steps.
Start by uploading a set of sample PDF documents that represent the specific document type you want to extract data from. You will use these documents throughout the data extraction process.

<img src={IdpUploadDocumentsUnstructuredImg} alt="Unstructured data extraction screen" />

Expand All @@ -52,7 +52,9 @@ Start by uploading a single model sample document that contains all the data fie

- If a single document doesn’t include all the data fields you require, upload multiple documents to cover all the variations for the document type. The number and range of sample documents you need to upload depends on the complexity of your unstructured data and your requirements.

- For example, you should upload as a minimum a single sample document for each variation of a document type. This may provide enough extraction accuracy if it is an exact representation of the specific type of document. However, it is more likely that you will need to upload multiple sample documents to ensure extraction accuracy.
- For example, you must upload at least one sample document for each variation of a document type. This may provide enough extraction accuracy if it is an exact representation of the specific type of document, with no variations in layout or content. However, it is more likely that you will need to upload multiple sample documents to ensure extraction accuracy.

- When choosing your sample documents, variation is important to ensure the system captures the full range of document types it will encounter. As a general guideline, Camunda recommends starting with three to five documents, and uploading more as needed to represent the full range of possible data types.

## Step 2: Extract fields

Expand Down Expand Up @@ -83,7 +85,7 @@ Add an extraction field for each piece of data you want to extract from your doc

### Extract data and save as a test case

Once you have set up your extraction fields, you can select an LLM foundation model and test it to see what data is extracted.
Once you have set up your extraction fields, you can select an LLM foundation model and test the data extraction.

1. **Extraction model**: Select the LLM foundation model you want to use.
1. Select the document you want to test the data extraction against.
Expand All @@ -106,7 +108,7 @@ Once you have set up your extraction fields, you can select an LLM foundation mo

## Step 3: Validate extraction

On the **Validate extraction** tab, you can validate and test your configured data extraction against all your uploaded documents. This step evaluates the accuracy of the data extraction, using your chosen LLM foundation model and extraction fields/prompts.
On the **Validate extraction** tab, validate and test your configured data extraction against your uploaded documents. This step evaluates the data extraction results produced by an LLM foundation model using your extraction fields.

<img src={IdpValidationResultsImg} alt="Validate extraction screen" />

Expand All @@ -128,11 +130,11 @@ Search and filter the results if you want to work with specific documents or ext

### Validation status

| Icon | Status | Description |
| :-------------------------------------------------------------------------- | :------ | :------------------------------------------------------------------------------------------------------------------------------- |
| <img src={IdpIconPassImg} alt="Pass icon" className="inline-image" /> | Pass | The document validation passed with accurate and expected results. |
| <img src={IdpIconCautionImg} alt="Caution icon" className="inline-image" /> | Caution | A test case is missing for comparison. Click **Save test case** to... |
| <img src={IdpIconFailImg} alt="Fail icon" className="inline-image" /> | Fail | The validation results do not match the expected output for this document. Click **Review document** to investigate and resolve. |
| Icon | Status | Description |
| :-------------------------------------------------------------------------- | :------ | :------------------------------------------------------------------------------------------------------------------------------ |
| <img src={IdpIconPassImg} alt="Pass icon" className="inline-image" /> | Pass | The document validation passed with accurate and expected results. |
| <img src={IdpIconCautionImg} alt="Caution icon" className="inline-image" /> | Caution | A test case is missing for comparison. Click **Save test case** to... |
| <img src={IdpIconFailImg} alt="Fail icon" className="inline-image" /> | Fail | The validation results do not match the expected output for the document. Click **Review document** to investigate and resolve. |

#### Example

Expand All @@ -144,11 +146,11 @@ The following example shows the results of a partially successful extraction.

## Step 4: Publish

On the **Publish** tab, publish the document extraction project to make it available for [integration into your processes](idp-integrate.md) or [document automation](idp-document-automation.md) projects.
On the **Publish** tab, publish the project to make it available for [integration into your processes](idp-integrate.md) and [document automation](idp-document-automation.md) projects.

<img src={IdpPublishProjectImg} alt="Publish project screen" />

1. The unpublished project is shown with a “Draft” Status. Click **Publish** to open the **Publish Extraction Project** modal.
1. Unpublished projects are shown with a “Draft” **Status**. Click **Publish** to open the **Publish Extraction Project** modal.
1. Enter a version name and description for the project and click **Publish**.
1. The project is published and now available to use [in your processes](idp-integrate.md) or [document automation](idp-document-automation.md) projects.

Expand Down

0 comments on commit 89d2aa8

Please sign in to comment.