Skip to content

Commit

Permalink
TW edits
Browse files Browse the repository at this point in the history
  • Loading branch information
mesellings committed Sep 17, 2024
1 parent 8ceeec8 commit 74b389f
Show file tree
Hide file tree
Showing 2 changed files with 54 additions and 90 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -5,99 +5,98 @@ sidebar_label: AWS Textract Connector
description: Extract printed text, handwriting, layout elements, and data from any document.
---

import ConnectorTask from '../../../components/react-components/connector-task.md'

:::info
The **Amazon Textract Connector** is available for `8.6.0` or later.
:::

The **Amazon Textract Connector** allows you to integrate your BPMN service with [Amazon Textract Service](https://aws.amazon.com/textract/) to extract text from various types of documents.
The **Amazon Textract Connector** allows you to integrate your BPMN service with [Amazon Textract](https://aws.amazon.com/textract/) to extract text from documents.

## Prerequisites

To use the **Amazon Textract Connector**, you need to have an **AWS IAM Access Key** and **Secret Key** with the appropriate Textract permissions. Refer to the [AWS Textract Developer Guide](https://docs.aws.amazon.com/textract/latest/dg/getting-started.html) for setup instructions.
To use this Connector, you'll need an **AWS IAM Access Key** and **Secret Key** with the appropriate Textract permissions. Refer to the [AWS Textract Developer Guide](https://docs.aws.amazon.com/textract/latest/dg/getting-started.html) for setup instructions.

:::note
Use **Camunda secrets** to avoid exposing your AWS IAM credentials as plain text.
Refer to [managing secrets](components/console/manage-clusters/manage-secrets.md) for more details.
Use **Camunda secrets** to avoid exposing your AWS IAM credentials as plain text. See [manage secrets](components/console/manage-clusters/manage-secrets.md).
:::

## Create an Amazon Textract Connector task

import ConnectorTask from '../../../components/react-components/connector-task.md'

<ConnectorTask/>

## Make your Amazon Textract Connector executable

To execute the **Amazon Textract Connector**, ensure all mandatory fields are correctly filled.
To execute the Connector, you must ensure all mandatory fields are correctly filled.

## 1. Authentication
### Authentication

Choose an authentication type from the **Authentication** dropdown. For details on the different authentication types, refer to the [appendix](#aws-authentication-types).
Select an authentication type from the **Authentication** dropdown:

If you select **Credentials**, the following fields must be provided:
1. **Credentials**: Select this option if you have an AWS **Access Key** and **Secret Key**. This method is applicable for both SaaS and Self-Managed users. If you select this option, you must provide the following required fields to use the connector:

- **Access Key**: The AWS access key for a user with Textract permissions.
- **Secret Key**: The corresponding AWS secret key.
- **Access Key**: AWS access key for the user with Textract permissions.
- **Secret Key**: The corresponding AWS secret key.

Both **Access Key** and **Secret Key** are required to use the Connector.
2. **Default Credentials Chain** (hybrid/Self-Managed only): Select this option if your system uses implicit authentication methods such as role-based access, environment variables, or files on the target host. This method is only applicable for Self-Managed or hybrid environments. It uses the [Default Credential Provider Chain](https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/credentials.html) to resolve credentials.

## 2. **Configuration (AWS region)**
### Configure AWS region

After authentication, set the AWS **Region** where the Textract service is hosted:
Set the AWS **Region** where the Textract service is hosted:

- **Region**: Specify the region (for example, `us-east-1`, `eu-west-1`).

:::note
Ensure the region matches the location of your Textract service and S3 buckets to reduce latency and meet compliance requirements.
Ensure the region matches the location of your Textract service and S3 buckets to reduce latency and meet compliance requirements. For a full list of AWS regions, refer to [AWS Regional Data](https://aws.amazon.com/about-aws/global-infrastructure/regions_az/).
:::

For a full list of AWS regions, refer to [AWS Regional Data](https://aws.amazon.com/about-aws/global-infrastructure/regions_az/).

## 3. Configure input
### Configure input

### Execution types
#### Execution types

Select the desired execution type from the **Execution Type** dropdown. The following options are available:

- **Real-time**

Use **Real-time** execution for single-page PDF documents or smaller files where immediate text extraction is needed. This method processes the document instantly, allowing you to quickly retrieve the data.

:::note
**Real-time** execution supports only **single-page PDFs**. For multi-page PDFs, consider using **Polling** or **Asynchronous** execution.
:::
Select the desired execution type from the **Execution Type** dropdown:

For more details, see [real-time PDF processing](https://aws.amazon.com/about-aws/whats-new/2022/01/amazon-textract-pdf-processing-jpeg-encoded-images/).
- **Real-time**: Use for single-page PDF documents or smaller files that require immediate text extraction. This method processes the document instantly, allowing you to quickly retrieve the data.

- **Polling**
:::note
**Real-time** execution only supports single-page PDFs. For multi-page PDFs, consider using **Polling** or **Asynchronous** execution. For more information, refer to [real-time PDF processing](https://aws.amazon.com/about-aws/whats-new/2022/01/amazon-textract-pdf-processing-jpeg-encoded-images/).
:::

The **Polling** execution type collects data in chunks. After processing the document, it returns a token that allows you to retrieve the next result. This method is ideal for multi-page documents or large files that take longer to process.
- **Polling**: The **Polling** execution type collects data in chunks. After processing the document, it returns a token that allows you to retrieve the next result. This method is ideal for multi-page documents or large files that take longer to process.

Polling continues retrieving results until the entire document is processed or until there are no more tokens left.
Polling continues retrieving results until the entire document is processed or until there are no more tokens remaining.

:::note
Use **Polling** for documents that exceed the limitations of **Real-time** execution.
:::
:::note
Use **Polling** for documents that exceed the limitations of **Real-time** execution.
:::

- **Asynchronous**

Use **Asynchronous** execution when processing large or complex documents where immediate results are not required. This method allows you to submit a document for analysis and receive results at a later time, making it ideal for background processing or batch operations.
Use **Asynchronous** execution when processing large or complex documents that do not require immediate results. This method allows you to submit a document for analysis and receive results at a later time, making it ideal for background processing or batch operations.

**Asynchronous** execution offers more flexibility than real-time or polling, as it enables you to process documents without waiting for immediate responses. This is particularly useful for larger files or when handling multiple documents simultaneously.
**Asynchronous** execution offers more flexibility than real-time or polling execution, as it allows you to process documents without waiting for immediate responses. This is particularly useful for larger files or when handling multiple documents simultaneously.

In this mode, you can configure several optional fields, such as setting up notifications when the processing is complete or defining specific output locations for results.
In this mode, you can configure the following optional fields:

For more details on the optional fields that can be configured during asynchronous execution, refer to [asynchronous execution optional fields](#asynchronous-execution-optional-fields).
- **Client Request Token**: An idempotent token used to identify the start request.
- **Job Tag**: A tag included in the completion notification, published to the Amazon SNS topic.
- **KMS Key ID**: The KMS key used to encrypt inference results.
- **Notification Channel Role ARN**: The Amazon SNS role ARN for publishing the operation's completion status. Also requires the **Notification Channel SNS Topic ARN** field.
- **Notification Channel SNS Topic ARN**: The SNS topic ARN for publishing the operation's completion status. Also requires the **Notification Channel Role ARN** field.
- **Output S3 Bucket**: The bucket where the processed document's output will be stored.
- **Output S3 Prefix**: The prefix under which the output will be saved. Also requires the **Output S3 Bucket** field.

### Document Bucket
For example, you can use optional fields to set up notifications for when the processing is complete, or to define specific output locations for results.

Enter the **S3 Bucket** that contains the document to be processed. Ensure that the bucket has the correct permissions to allow Textract to access the document.
#### Document Bucket

### Document path
Enter the **S3 Bucket** that contains the document to be processed. Ensure the bucket has the correct permissions to allow Textract to access the document.

#### Document path

Enter the **S3 Document Path** to the file you want to process. This should include the full path from the bucket root to the document. Make sure the document path is properly structured and accessible by the Textract service.

### Feature types
#### Feature types

Select one or more **Feature Types** from the following options:

Expand All @@ -108,46 +107,27 @@ Select one or more **Feature Types** from the following options:

At least one feature type must be selected, and choosing multiple options can provide richer data extraction results depending on your document’s format.

### Document version (optional)

Specify the **Document Version** if you need to process a specific version. If left blank, the latest version of the document will be processed. Document versioning can be useful for tracking changes over time or processing a specific iteration of a document.
#### Document version (optional)

## Asynchronous execution optional fields
Specify the **Document Version** if you need to process a specific version of the document. If unspecified, the latest version of the document is processed. Document versioning is useful for tracking changes over time or processing a specific document iteration.

When using asynchronous execution, the following optional fields can be configured:
## Response

- **Client Request Token**: An idempotent token used to identify the start request.
- **Job Tag**: A tag included in the completion notification, published to the Amazon SNS topic.
- **KMS Key ID**: The KMS key used to encrypt inference results.
- **Notification Channel Role ARN**: The Amazon SNS role ARN for publishing the operation's completion status.
- **Notification Channel SNS Topic ARN**: The SNS topic ARN for publishing the operation's completion status.

:::note
If **Notification Channel Role ARN** or **SNS Topic ARN** is specified, both must be filled.
:::

- **Output S3 Bucket**: The bucket where the processed document's output will be stored.
- **Output S3 Prefix**: The prefix under which the output will be saved.

:::note
If **Output S3 Prefix** is specified, the **Output S3 Bucket** must also be filled.
:::

## Amazon Textract Connector response

The response from the **Amazon Textract Connector** will mirror the AWS Textract service’s response. The type of response you receive depends on the execution mode selected:
The response from the **Amazon Textract Connector** mirrors the AWS Textract service’s response. The type of response you receive depends on the execution mode selected:

- **[Real-time Execution Response](https://docs.aws.amazon.com/textract/latest/dg/API_AnalyzeDocument.html#API_AnalyzeDocument_ResponseSyntax)**: Provides immediate analysis for single-page documents.
- **[Polling Execution Response](https://docs.aws.amazon.com/textract/latest/dg/API_GetDocumentAnalysis.html#API_GetDocumentAnalysis_ResponseSyntax)**: Returns chunks of data in a paginated format for multi-page or complex documents.
- **[Asynchronous Execution Response](https://docs.aws.amazon.com/textract/latest/dg/API_StartDocumentAnalysis.html#API_StartDocumentAnalysis_ResponseSyntax)**: Used for batch processing where results are returned later through job completion.

### Using the Textract Connector response in your process
### Use the Textract Connector response in your process

The **Amazon Textract Connector** provides the same response structure as the AWS Textract API. You can map fields from the response to process variables, depending on your needs. Here's an example of how to extract specific fields using **Result Expression** and **Result Variable**:
The **Amazon Textract Connector** provides the same response structure as the AWS Textract API. You can map fields from the response to process variables, depending on your needs.

For example, to extract specific fields using **Result Expression** and **Result Variable**:

#### Example Textract Response (real-time execution)

Utilize output mapping to align this response with process variables:
Use output mapping to align this response with process variables:

1. Use **Result Variable** to store the response in a process variable. For example, `myResultVariable`. This approach stores the entire Textract message as a process variable named `myResultVariable`.
2. Use **Result Expression** to map fields from the response into process variables. This approach allows for more granularity. Instead of storing the entire response in one variable, you can extract specific fields from the **Textract Connector** message and assign them to different process variables. This is particularly useful when you are only interested in certain parts of the message, or when different parts of the message need to be used separately in your process.
Expand Down Expand Up @@ -195,19 +175,3 @@ Mapped values **result**:
"blockType": "LINE"
}
```

## Appendix & FAQ

### How do I securely store AWS IAM credentials for my Textract Connector?

Store your AWS IAM credentials as **Camunda secrets** to avoid exposing sensitive information. Follow our [managing secrets guide](components/console/manage-clusters/manage-secrets.md) to learn more.

### AWS authentication types

You can authenticate the **Amazon Textract Connector** in two ways:

1. **Credentials**:
Select this option if you have an AWS **Access Key** and **Secret Key**. This method is applicable for both SaaS and Self-Managed users.

2. **Default Credentials Chain (hybrid/Self-Managed only)**:
Select this option if your system uses implicit authentication methods like role-based access, environment variables, or files on the target host. This method is applicable only for Self-Managed or hybrid environments. It uses the [Default Credential Provider Chain](https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/credentials.html) to resolve credentials.
4 changes: 2 additions & 2 deletions docs/components/react-components/connector-task.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
---
---

A Connector can be applied to a task or an event using the append menu. Access the append menu using any of the three methods below:
You can apply a Connector to a task or event via the append menu. For example:

- **From the canvas**: Select an element and click the **Change element** icon to change an existing element, or use the append feature to add a new element to the diagram.
- **From the properties panel**: Navigate to the **Template** section and click **Select**.
- **From the side palette**: Click the **Create element** icon.

![change element](./img/change-element.png)

Once you have applied a Connector to your element, follow the configuration steps or read our [guide on using Connectors](/components/connectors/use-connectors/index.md) to learn more.
After you have applied a Connector to your element, follow the configuration steps or see [using Connectors](/components/connectors/use-connectors/index.md) to learn more.

0 comments on commit 74b389f

Please sign in to comment.