From 8644b656c36e999f4a9e02f43bdb6947188a07e7 Mon Sep 17 00:00:00 2001 From: Srikanth Govindarajan Date: Tue, 19 Nov 2024 09:21:36 -0800 Subject: [PATCH 1/4] Configuration changes to lambda processor and sink Signed-off-by: Srikanth Govindarajan --- .../configuration/processors/aws-lambda.md | 93 +++++++++++++--- .../pipelines/configuration/sinks/aws_lamda | 104 ++++++++++++++++++ 2 files changed, 180 insertions(+), 17 deletions(-) create mode 100644 _data-prepper/pipelines/configuration/sinks/aws_lamda diff --git a/_data-prepper/pipelines/configuration/processors/aws-lambda.md b/_data-prepper/pipelines/configuration/processors/aws-lambda.md index bd167996a1..0041d9a97a 100644 --- a/_data-prepper/pipelines/configuration/processors/aws-lambda.md +++ b/_data-prepper/pipelines/configuration/processors/aws-lambda.md @@ -18,19 +18,62 @@ The `aws_lambda` processor enables invocation of an AWS Lambda function within y You can configure the processor using the following configuration options. -Field | Type | Required | Description --------------------- | ------- | -------- | ---------------------------------------------------------------------------- -`function_name` | String | Required | The name of the AWS Lambda function to invoke. -`invocation_type` | String | Required | Specifies the invocation type, either `request-response` or `event`. Default is `request-response`. -`aws.region` | String | Required | The AWS Region in which the Lambda function is located. -`aws.sts_role_arn` | String | Optional | The Amazon Resource Name (ARN) of the role to assume before invoking the Lambda function. -`max_retries` | Integer | Optional | The maximum number of retries for failed invocations. Default is `3`. -`batch` | Object | Optional | The batch settings for the Lambda invocations. Default is `key_name = "events"`. Default threshold is `event_count=100`, `maximum_size="5mb"`, and `event_collect_timeout = 10s`. -`lambda_when` | String | Optional | A conditional expression that determines when to invoke the Lambda processor. -`response_codec` | Object | Optional | A codec configuration for parsing Lambda responses. Default is `json`. -`tags_on_match_failure` | List | Optional | A list of tags to add to events when Lambda matching fails or encounters an unexpected error. -`sdk_timeout` | Duration| Optional | Configures the SDK's client connection timeout period. Default is `60s`. -`response_events_match` | Boolean | Optional | Specifies how Data Prepper interprets and processes Lambda function responses. Default is `false`. +# AWS Lambda Processor and Sink for Data Prepper + +This document provides the configuration details and usage instructions for integrating AWS Lambda with Data Prepper, both as a processor and as a sink. + +## AWS Lambda Processor + +The AWS Lambda processor allows you to invoke an AWS Lambda function in your Data Prepper pipeline to process events. This can be used for synchronous or asynchronous invocations based on your requirements. + +### Configuration Fields + +| Field | Type | Required | Default | Description | +|-----------------------|----------|----------|------------------|-------------------------------------------------------------------------------------------------------------------| +| function_name | String | Yes | - | The name of the AWS Lambda function to invoke. Must be between 3 and 500 characters. | +| invocation_type | String | No | request-response | Specifies the invocation type: either request-response or EVENT. | +| aws | Object | Yes | - | AWS authentication options. | +| client | Object | No | - | Client options for AWS SDK. | +| batch | Object | No | - | Batch options for Lambda invocations. | +| response_codec | Object | No | - | Codec configuration for parsing Lambda responses. | +| response_events_match | Boolean | No | false | Defines how Data Prepper treats the response from Lambda. | +| lambda_when | String | No | - | Defines a condition for when to use this processor. | +| tags_on_failure | List | No | [] | List of tags to be set on the event when Lambda fails or an exception occurs. | + +### AWS Authentication Options + +| Field | Type | Required | Description | +|----------------------|--------|----------|--------------------------------------------------------------------------------| +| region | String | Yes | The AWS region where the Lambda function is located. | +| sts_role_arn | String | No | ARN of the role to assume before invoking the Lambda function. | +| sts_external_id | String | No | External ID to use when assuming the role. | +| sts_header_overrides | Map | No | Map of headers to override in the STS request. Maximum of 5 headers allowed. | + +### Client Options + +| Field | Type | Default | Description | +|--------------------|----------|------------------------|-----------------------------------------------------------------------| +| max_retries | Integer | 3 | Maximum number of retries for failed requests. | +| api_call_timeout | Duration | 60s | Timeout for API calls. | +| connection_timeout | Duration | 60s | Timeout for establishing a connection. | +| max_concurrency | Integer | 200 | Maximum number of concurrent connections. | +| base_delay | Duration | 100ms | Base delay for exponential backoff. | +| max_backoff | Duration | 20s | Maximum backoff time for exponential backoff. | + +### Batch Options + +| Field | Type | Default | Description | +|-----------|--------|---------|-------------------------------------------| +| key_name | String | events | Key name for the batch of events. | +| threshold | Object | - | Threshold options for batching. | + +#### Threshold Options + +| Field | Type | Default | Description | +|----------------------|----------|---------|------------------------------------------------| +| event_count | Integer | 100 | Maximum number of events in a batch. | +| maximum_size | String | 5mb | Maximum size of a batch. | +| event_collect_timeout| Duration | 10s | Timeout for collecting events for a batch. | #### Example configuration @@ -43,22 +86,23 @@ processors: aws: region: "us-east-1" sts_role_arn: "arn:aws:iam::123456789012:role/my-lambda-role" - max_retries: 3 + client: + max_retries: 3 batch: key_name: "events" threshold: event_count: 100 maximum_size: "5mb" event_collect_timeout: PT10S - lambda_when: "event['status'] == 'process'" + lambda_when: "/loglevel == 'INFO'" + tags_on_failure: ["lambda_error", "processing_failed"] ``` {% include copy-curl.html %} ## Usage -The processor supports the following invocation types: - +The processor supports the following: - `request-response`: The processor waits for Lambda function completion before proceeding. - `event`: The function is triggered asynchronously without waiting for a response. - `batch`: When enabled, events are aggregated and sent in bulk to optimize Lambda invocations. Batch thresholds control the event count, size limit, and timeout. @@ -76,6 +120,8 @@ The `response_events_match` setting defines how Data Prepper handles the relatio - `true`: Lambda returns a JSON array with results for each batched event. Data Prepper maps this array back to its corresponding original event, ensuring that each event in the batch gets the corresponding part of the response from the array. - `false`: Lambda returns one or more events for the entire batch. Response events are not correlated with the original events. Original event metadata is not preserved in the response events. For example, when `response_events_match` is set to `true`, the Lambda function is expected to return the same number of response events as the number of original requests, maintaining the original order. +Note: Return from lambda should always be an array + ## Limitations Note the following limitations: @@ -83,6 +129,19 @@ Note the following limitations: - Payload limitation: 6 MB payload limit - Response codec: JSON-only codec support +- ## Example Lambda +``` +import json + +def lambda_handler(event, context): + output = [] + for input in input_arr = event['']: + input["transformed"] = "true"; + output.append(input) + + return output +``` + ## Integration testing Integration tests for this plugin are executed separately from the main Data Prepper build process. Use the following Gradle command to run these tests: diff --git a/_data-prepper/pipelines/configuration/sinks/aws_lamda b/_data-prepper/pipelines/configuration/sinks/aws_lamda new file mode 100644 index 0000000000..168c0538c2 --- /dev/null +++ b/_data-prepper/pipelines/configuration/sinks/aws_lamda @@ -0,0 +1,104 @@ + +## AWS Lambda sink + +You can configure the sink using the following configuration options. + +| Field | Type | Required | Default | Description | +|-----------------|---------|----------|---------|-----------------------------------------------------------------------------| +| function_name | String | Yes | - | The name of the AWS Lambda function to invoke. | +| invocation_type | String | No | event | Specifies the invocation type: either EVENT or REQUEST_RESPONSE. | +| aws | Object | Yes | - | AWS authentication options. | +| client | Object | No | - | Client options for AWS SDK. | +| batch | Object | No | - | Batch options for Lambda invocations. | +| lambda_when | String | No | - | Conditional expression to determine when to invoke the Lambda function. | +| dlq | Object | No | - | Dead-letter queue (DLQ) configuration for failed invocations. | + + +### AWS Authentication Options + +| Field | Type | Required | Description | +|----------------------|--------|----------|--------------------------------------------------------------------------------| +| region | String | Yes | The AWS region where the Lambda function is located. | +| sts_role_arn | String | No | ARN of the role to assume before invoking the Lambda function. | +| sts_external_id | String | No | External ID to use when assuming the role. | +| sts_header_overrides | Map | No | Map of headers to override in the STS request. Maximum of 5 headers allowed. | + +### Client Options + +| Field | Type | Default | Description | +|--------------------|----------|------------------------|-----------------------------------------------------------------------| +| max_retries | Integer | 3 | Maximum number of retries for failed requests. | +| api_call_timeout | Duration | 60s | Timeout for API calls. | +| connection_timeout | Duration | 60s | Timeout for establishing a connection. | +| max_concurrency | Integer | 200 | Maximum number of concurrent connections. | +| base_delay | Duration | 100ms | Base delay for exponential backoff. | +| max_backoff | Duration | 20s | Maximum backoff time for exponential backoff. | + +### Batch Options + +| Field | Type | Default | Description | +|-----------|--------|---------|-------------------------------------------| +| key_name | String | events | Key name for the batch of events. | +| threshold | Object | - | Threshold options for batching. | + +### Threshold Options + +| Field | Type | Default | Description | +|----------------------|----------|---------|------------------------------------------------| +| event_count | Integer | 100 | Maximum number of events in a batch. | +| maximum_size | String | 5mb | Maximum size of a batch. | +| event_collect_timeout| Duration | 10s | Timeout for collecting events for a batch. | + + +#### Example configuration + +``` +sink: + - aws_lambda: + function_name: "my-lambda-sink" + invocation_type: "event" + aws: + region: "us-west-2" + sts_role_arn: "arn:aws:iam::123456789012:role/my-lambda-sink-role" + client: + max_retries: 3 + batch: + key_name: "events" + threshold: + event_count: 50 + maximum_size: "3mb" + event_collect_timeout: PT5S + lambda_when: "/loglevel == 'INFO'" + dlq: + region: "us-east-1" + sts_role_arn: "arn:aws:iam::123456789012:role/my-sqs-role" + bucket: "<>" +``` +{% include copy-curl.html %} + +## Usage + +The sink supports the following invocation types: + +- `event`: The function is triggered asynchronously without waiting for a response. +- `request-response`: Not supported for sink operations. +- `Batching`: When enabled, events are aggregated and sent in bulk to optimize Lambda invocations. Default is `enabled`. +- `DLQ`: A setup available for routing and processing events that persistently fail Lambda invocations after multiple retry attempts. + +## Advanced configurations + +The AWS Lambda processor and sink provide the following advanced options for security and performance optimization: + +- AWS Identity and Access Management (IAM) role assumption: The processor and sink support assuming the specified IAM role `aws.sts_role_arn` before Lambda invocation. This enhances secure handling by providing access control to AWS resources. +- Concurrency management: When using the `event` invocation type, consider Lambda concurrency limits to avoid throttling. + +For more information about AWS Lambda integration with Data Prepper, see the [AWS Lambda documentation](https://docs.aws.amazon.com/lambda). + +## Integration testing + +Integration tests for this plugin are executed separately from the main Data Prepper build process. Use the following Gradle command to run these tests: + +``` +./gradlew :data-prepper-plugins:aws-lambda:integrationTest -Dtests.sink.lambda.region="us-east-1" -Dtests.sink.lambda.functionName="lambda_test_function" -Dtests.sink.lambda.sts_role_arn="arn:aws:iam::123456789012:role/dataprepper-role +``` +{% include copy-curl.html %} \ No newline at end of file From 4a3a94f83cf2189f6916d479eae06c30f0d71f13 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Thu, 21 Nov 2024 12:26:36 -0700 Subject: [PATCH 2/4] Update aws-lambda.md doc review underway Signed-off-by: Melissa Vagi --- .../configuration/processors/aws-lambda.md | 80 ++++++++----------- 1 file changed, 35 insertions(+), 45 deletions(-) diff --git a/_data-prepper/pipelines/configuration/processors/aws-lambda.md b/_data-prepper/pipelines/configuration/processors/aws-lambda.md index 0041d9a97a..824a612185 100644 --- a/_data-prepper/pipelines/configuration/processors/aws-lambda.md +++ b/_data-prepper/pipelines/configuration/processors/aws-lambda.md @@ -10,44 +10,34 @@ nav_order: 10 The [AWS Lambda](https://aws.amazon.com/lambda/) integration allows developers to use serverless computing capabilities within their Data Prepper pipelines for flexible event processing and data routing. -## AWS Lambda processor configuration +## `aws_lambda` processor configuration -The `aws_lambda` processor enables invocation of an AWS Lambda function within your Data Prepper pipeline in order to process events. It supports both synchronous and asynchronous invocations based on your use case. +You can use the `aws_lambda` processor to invoke AWS Lambda functions synchronously or asynchronously to process events in your Data Prepper pipeline. -## Configuration fields +### Configuration fields You can configure the processor using the following configuration options. -# AWS Lambda Processor and Sink for Data Prepper - -This document provides the configuration details and usage instructions for integrating AWS Lambda with Data Prepper, both as a processor and as a sink. - -## AWS Lambda Processor - -The AWS Lambda processor allows you to invoke an AWS Lambda function in your Data Prepper pipeline to process events. This can be used for synchronous or asynchronous invocations based on your requirements. - -### Configuration Fields - -| Field | Type | Required | Default | Description | -|-----------------------|----------|----------|------------------|-------------------------------------------------------------------------------------------------------------------| -| function_name | String | Yes | - | The name of the AWS Lambda function to invoke. Must be between 3 and 500 characters. | -| invocation_type | String | No | request-response | Specifies the invocation type: either request-response or EVENT. | -| aws | Object | Yes | - | AWS authentication options. | -| client | Object | No | - | Client options for AWS SDK. | -| batch | Object | No | - | Batch options for Lambda invocations. | -| response_codec | Object | No | - | Codec configuration for parsing Lambda responses. | -| response_events_match | Boolean | No | false | Defines how Data Prepper treats the response from Lambda. | -| lambda_when | String | No | - | Defines a condition for when to use this processor. | -| tags_on_failure | List | No | [] | List of tags to be set on the event when Lambda fails or an exception occurs. | - -### AWS Authentication Options - -| Field | Type | Required | Description | -|----------------------|--------|----------|--------------------------------------------------------------------------------| -| region | String | Yes | The AWS region where the Lambda function is located. | -| sts_role_arn | String | No | ARN of the role to assume before invoking the Lambda function. | -| sts_external_id | String | No | External ID to use when assuming the role. | -| sts_header_overrides | Map | No | Map of headers to override in the STS request. Maximum of 5 headers allowed. | +Field | Type | Required | Default | Description +--------|------|----------|---------|------------- +`function_name` | String | Required | - | The name of the AWS Lambda function to invoke. Must be between 3 and 500 characters. +`aws` | Object | Required | - | AWS authentication options. +`invocation_type` | String | Optional | `request-response` | Specifies the invocation type: either `request-response` or `EVENT`. +`client`| Object | No | - | Client options for AWS SDK. +`batch` | Object | No | - | Batch options for Lambda invocations. +`response_codec` | Object | No | - | Codec configuration for parsing Lambda responses. +`response_events_match` | Boolean | No | `false` | Defines how Data Prepper treats the response from Lambda. +`lambda_when` | String | No | - | Defines a condition for when to use this processor. +`tags_on_failure` | List | No | `[]` | List of tags to be set on the event when Lambda fails or an exception occurs. + +### AWS authentication options + +Field | Type | Required | Description +------|--------|----------|------------ +`region` | String | Yes | The AWS region where the Lambda function is located. +`sts_role_arn`| String | No | ARN of the role to assume before invoking the Lambda function. +`sts_external_id` | String | No | The external ID to use when assuming the role. +`sts_header_overrides` | Map | No | The map of headers to override in the STS request. Maximum of five headers allowed. ### Client Options @@ -60,11 +50,11 @@ The AWS Lambda processor allows you to invoke an AWS Lambda function in your Dat | base_delay | Duration | 100ms | Base delay for exponential backoff. | | max_backoff | Duration | 20s | Maximum backoff time for exponential backoff. | -### Batch Options +### Batch options | Field | Type | Default | Description | |-----------|--------|---------|-------------------------------------------| -| key_name | String | events | Key name for the batch of events. | +| `key_name` | String | events | Key name for the batch of events. | | threshold | Object | - | Threshold options for batching. | #### Threshold Options @@ -115,21 +105,13 @@ When configured for batching, the AWS Lambda processor groups multiple events in ## Lambda response handling -The `response_events_match` setting defines how Data Prepper handles the relationship between batch events sent to Lambda and the response received: +The `response_events_match` setting defines how Data Prepper handles the relationship between batch events sent to Lambda and the response received. The Return from lambda should always be an array - `true`: Lambda returns a JSON array with results for each batched event. Data Prepper maps this array back to its corresponding original event, ensuring that each event in the batch gets the corresponding part of the response from the array. - `false`: Lambda returns one or more events for the entire batch. Response events are not correlated with the original events. Original event metadata is not preserved in the response events. For example, when `response_events_match` is set to `true`, the Lambda function is expected to return the same number of response events as the number of original requests, maintaining the original order. -Note: Return from lambda should always be an array - -## Limitations - -Note the following limitations: - -- Payload limitation: 6 MB payload limit -- Response codec: JSON-only codec support +#### Example Lambda function -- ## Example Lambda ``` import json @@ -141,6 +123,14 @@ def lambda_handler(event, context): return output ``` +{% include copy-curl.html %} + +### Limitations + +Note the following limitations: + +- Payload limitation: 6 MB payload limit +- Response codec: JSON-only codec support ## Integration testing From 67985cd1f13ed2735b1e4973812758f67956267a Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Thu, 21 Nov 2024 16:52:42 -0700 Subject: [PATCH 3/4] Update aws-lambda.md Doc review complete Signed-off-by: Melissa Vagi --- .../configuration/processors/aws-lambda.md | 136 +++++++++--------- 1 file changed, 66 insertions(+), 70 deletions(-) diff --git a/_data-prepper/pipelines/configuration/processors/aws-lambda.md b/_data-prepper/pipelines/configuration/processors/aws-lambda.md index 824a612185..44602278ba 100644 --- a/_data-prepper/pipelines/configuration/processors/aws-lambda.md +++ b/_data-prepper/pipelines/configuration/processors/aws-lambda.md @@ -6,66 +6,13 @@ grand_parent: Pipelines nav_order: 10 --- -# aws_lambda integration for Data Prepper +# aws_lambda -The [AWS Lambda](https://aws.amazon.com/lambda/) integration allows developers to use serverless computing capabilities within their Data Prepper pipelines for flexible event processing and data routing. +You can use the `aws_lambda` processor to invoke AWS Lambda functions synchronously or asynchronously to process events in your Data Prepper pipeline. The [AWS Lambda](https://aws.amazon.com/lambda/) integration allows developers to use serverless computing capabilities within their Data Prepper pipelines for flexible event processing and data routing. -## `aws_lambda` processor configuration +## Processor configuration -You can use the `aws_lambda` processor to invoke AWS Lambda functions synchronously or asynchronously to process events in your Data Prepper pipeline. - -### Configuration fields - -You can configure the processor using the following configuration options. - -Field | Type | Required | Default | Description ---------|------|----------|---------|------------- -`function_name` | String | Required | - | The name of the AWS Lambda function to invoke. Must be between 3 and 500 characters. -`aws` | Object | Required | - | AWS authentication options. -`invocation_type` | String | Optional | `request-response` | Specifies the invocation type: either `request-response` or `EVENT`. -`client`| Object | No | - | Client options for AWS SDK. -`batch` | Object | No | - | Batch options for Lambda invocations. -`response_codec` | Object | No | - | Codec configuration for parsing Lambda responses. -`response_events_match` | Boolean | No | `false` | Defines how Data Prepper treats the response from Lambda. -`lambda_when` | String | No | - | Defines a condition for when to use this processor. -`tags_on_failure` | List | No | `[]` | List of tags to be set on the event when Lambda fails or an exception occurs. - -### AWS authentication options - -Field | Type | Required | Description -------|--------|----------|------------ -`region` | String | Yes | The AWS region where the Lambda function is located. -`sts_role_arn`| String | No | ARN of the role to assume before invoking the Lambda function. -`sts_external_id` | String | No | The external ID to use when assuming the role. -`sts_header_overrides` | Map | No | The map of headers to override in the STS request. Maximum of five headers allowed. - -### Client Options - -| Field | Type | Default | Description | -|--------------------|----------|------------------------|-----------------------------------------------------------------------| -| max_retries | Integer | 3 | Maximum number of retries for failed requests. | -| api_call_timeout | Duration | 60s | Timeout for API calls. | -| connection_timeout | Duration | 60s | Timeout for establishing a connection. | -| max_concurrency | Integer | 200 | Maximum number of concurrent connections. | -| base_delay | Duration | 100ms | Base delay for exponential backoff. | -| max_backoff | Duration | 20s | Maximum backoff time for exponential backoff. | - -### Batch options - -| Field | Type | Default | Description | -|-----------|--------|---------|-------------------------------------------| -| `key_name` | String | events | Key name for the batch of events. | -| threshold | Object | - | Threshold options for batching. | - -#### Threshold Options - -| Field | Type | Default | Description | -|----------------------|----------|---------|------------------------------------------------| -| event_count | Integer | 100 | Maximum number of events in a batch. | -| maximum_size | String | 5mb | Maximum size of a batch. | -| event_collect_timeout| Duration | 10s | Timeout for collecting events for a batch. | - -#### Example configuration +The following example configuration show a typical AWS Lambda processor configuration in Data Prepper, including the key configuration fields and their usage: ``` processors: @@ -90,25 +37,75 @@ processors: ``` {% include copy-curl.html %} -## Usage +## Configuring the processor + +Using the batch configuration options, the `aws_lambda` processor can group multiple events into a single request. Events are collected until eaching defined thresholds for event count, size limit, or timeout, then sent as one payload to the Lambda function. -The processor supports the following: -- `request-response`: The processor waits for Lambda function completion before proceeding. -- `event`: The function is triggered asynchronously without waiting for a response. -- `batch`: When enabled, events are aggregated and sent in bulk to optimize Lambda invocations. Batch thresholds control the event count, size limit, and timeout. -- `codec`: JSON is used for both request and response codecs. Lambda must return JSON array outputs. -- `tags_on_match_failure`: Custom tags can be applied to events when Lambda processing fails or encounters unexpected issues. +### Configuration fields + +You can configure the processor using the following configuration options. + +| Field | Type | Required | Default | Description | +|-------------------|----------|----------|---------|-------------| +| `function_name` | String | Required | - | The name of the AWS Lambda function to invoke. Must be between 3 and 500 characters. | +| `aws` | Object | Required | - | AWS authentication settings. | +| `invocation_type` | String | Optional | `request-response` | Specifies the invocation type. Choose either `request-response` or `EVENT`. | +| `client` | Object | Optional | - | The AWS SDK client configuration. | +| `batch` | Object | Optional | - | Optional batch settings for Lambda invocations. | +| `response_codec` | Object | Optional | - | The Lambda response parsing configuration. | +| `response_events_match` | Boolean | Optional | `false` | The Lambda response handling behavio | +| `lambda_when` | String | Optional | - | A conditional expression that determines when to invoke the processor. | +| `tags_on_failure` | List | Optional | `[]` | The tags applied on Lambda execution failures. | + +### AWS authentication options + +You can configure the processor using the following AWS authentication options. + +| Field | Type | Required | Description | +|----------|--------|----------|-------------| +| `region` | String | Required | The AWS region where the Lambda function is deployed. | +| `sts_role_arn`| String | Optional | The Amazon Resource Number (ARN) of the role to assume before invoking the Lambda function. | +| `sts_external_id` | String | Optional | The external ID to use when assuming the role. | +| `sts_header_overrides` | Map | Optional | The map of headers to override in the STS request. Maximum of five headers allowed. | + +### Client options + +You can configure the processor using the following client options. + +| Field | Type | Default | Description | +|--------------------|----------|---------|---------------------------------------------------| +| `max_retries` | Integer | 3 | Maximum number of retries for failed requests. | +| `api_call_timeout` | Duration | 60s | Timeout for API calls. | +| `connection_timeout`| Duration | 60s | Timeout for establishing a connection. | +| `max_concurrency` | Integer | 200 | Maximum number of concurrent connections. | +| `base_delay` | Duration | 100ms | Base delay for exponential backoff. | +| `max_backoff` | Duration | 20s | Maximum backoff time for exponential backoff. | + +### Batch options -## Behavior +You can configure the processor using the following batch options. -When configured for batching, the AWS Lambda processor groups multiple events into a single request. This grouping is governed by batch thresholds, which can be based on the event count, size limit, or timeout. The processor then sends the entire batch to the Lambda function as a single payload. +| Field | Type | Default | Description | +|-----------|--------|---------|-------------------------------------------| +| `key_name` | String | `events` | Key name for the batch of events. | +| threshold | Object | - | Threshold options for batching. | + +### Threshold options + +You can configure the processor using the following threshold options. + +| Field | Type | Default | Description | +|----------------------|----------|---------|------------------------------------------------| +| event_count | Integer | 100 | Maximum number of events in a batch. | +| maximum_size | String | 5mb | Maximum size of a batch. | +| event_collect_timeout| Duration | 10s | Timeout for collecting events for a batch. | ## Lambda response handling -The `response_events_match` setting defines how Data Prepper handles the relationship between batch events sent to Lambda and the response received. The Return from lambda should always be an array +The `response_events_match` parameter controls how Data Prepper processes Lambda function responses: -- `true`: Lambda returns a JSON array with results for each batched event. Data Prepper maps this array back to its corresponding original event, ensuring that each event in the batch gets the corresponding part of the response from the array. -- `false`: Lambda returns one or more events for the entire batch. Response events are not correlated with the original events. Original event metadata is not preserved in the response events. For example, when `response_events_match` is set to `true`, the Lambda function is expected to return the same number of response events as the number of original requests, maintaining the original order. +- `true`: The Lambda function returns a JSON array with results for each batched event. Data Prepper maintains event correlation by mapping each element of the response array to its matching source event in the original batch sequence. +- `false`: The Lambda function returns one or more events for the entire batch. Response events are processed independently when `response_events_match` is `false`, discarding the original event context and metadata. Conversely, setting it to `true` requires the Lambda function to return a matching array of responses that preserves the order and count of input events. #### Example Lambda function @@ -139,5 +136,4 @@ Integration tests for this plugin are executed separately from the main Data Pre ``` ./gradlew :data-prepper-plugins:aws-lambda:integrationTest -Dtests.processor.lambda.region="us-east-1" -Dtests.processor.lambda.functionName="lambda_test_function" -Dtests.processor.lambda.sts_role_arn="arn:aws:iam::123456789012:role/dataprepper-role ``` - {% include copy-curl.html %} From 0802e38f4d4ed0cac85d0453685ef7938fb6c150 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Fri, 22 Nov 2024 14:05:33 -0700 Subject: [PATCH 4/4] Update _data-prepper/pipelines/configuration/processors/aws-lambda.md Signed-off-by: Melissa Vagi --- _data-prepper/pipelines/configuration/processors/aws-lambda.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_data-prepper/pipelines/configuration/processors/aws-lambda.md b/_data-prepper/pipelines/configuration/processors/aws-lambda.md index 44602278ba..cf2c9e8c8d 100644 --- a/_data-prepper/pipelines/configuration/processors/aws-lambda.md +++ b/_data-prepper/pipelines/configuration/processors/aws-lambda.md @@ -53,7 +53,7 @@ You can configure the processor using the following configuration options. | `client` | Object | Optional | - | The AWS SDK client configuration. | | `batch` | Object | Optional | - | Optional batch settings for Lambda invocations. | | `response_codec` | Object | Optional | - | The Lambda response parsing configuration. | -| `response_events_match` | Boolean | Optional | `false` | The Lambda response handling behavio | +| `response_events_match` | Boolean | Optional | `false` | The Lambda response handling behavior | | `lambda_when` | String | Optional | - | A conditional expression that determines when to invoke the processor. | | `tags_on_failure` | List | Optional | `[]` | The tags applied on Lambda execution failures. |