terraform-aws-transformer-kinesis-ec2

A Terraform module which deploys the Transformer Kinesis service on EC2. If you want to use a custom AMI for this deployment you will need to ensure it is based on top of Amazon Linux 2.

WARNING: Due to the ability to introduce large numbers of duplicates when scaling this application horizontally we lock the application to a single instance - if you need more throughput from this application you will need to "vertically" scale it by changing the instance_type to a large node type and re-applying the module. By default this is a t3a.small which should handle over 100 RPS without needing any scale-up.

Telemetry

This module by default collects and forwards telemetry information to Snowplow to understand how our applications are being used. No identifying information about your sub-account or account fingerprints are ever forwarded to us - it is very simple information about what modules and applications are deployed and active.

If you wish to subscribe to our mailing list for updates to these modules or security advisories please set the user_provided_id variable to include a valid email address which we can reach you at.

How do I disable it?

To disable telemetry simply set variable telemetry_enabled = false.

What are you collecting?

For details on what information is collected please see this module: https://github.com/snowplow-devops/terraform-snowplow-telemetry

Usage

Standard usage

Transformer takes data from a enriched input stream and transforms this data and writes it into S3. There are two type of transformations. The first one is shredding and the second one is wide row. When shredding is activated, Transformer shreds event to custom entities in the event. When wide row is activated, it only converts event to JSON format.

module "enriched_stream" {
  source  = "snowplow-devops/kinesis-stream/aws"

  name = var.stream_name
}

module "transformed_bucket" {
  source  = "snowplow-devops/s3-bucket/aws"

  bucket_name = var.transformed_bucket
}

resource "aws_sqs_queue" "message_queue" {
  content_based_deduplication = true
  kms_master_key_id           = "alias/aws/sqs"
  # queue name should end with '.fifo'
  name                        = var.queue_name
  fifo_queue                  = true
}

module "transformer_kinesis" {
  source = "snowplow-devops/transformer-kinesis-ec2/aws"

  accept_limited_use_license = true

  name        = var.name
  vpc_id      = var.vpc_id
  subnet_ids  = var.subnet_ids
  
  stream_name             = module.enriched_stream.name
  s3_bucket_name          = var.transformed_bucket
  s3_bucket_object_prefix = "transformed/good"
  window_period_min       = 10
  sqs_queue_name          = aws_sqs_queue.message_queue.name

  ssh_key_name     = var.key_name
  ssh_ip_allowlist = ["0.0.0.0/0"]
}

Requirements

Name	Version
terraform	>= 1.0.0
aws	>= 3.72.0

Providers

Name	Version
aws	>= 3.72.0

Modules

Name	Source	Version
instance_type_metrics	snowplow-devops/ec2-instance-type-metrics/aws	0.1.2
kcl_autoscaling	snowplow-devops/dynamodb-autoscaling/aws	0.2.0
service	snowplow-devops/service-ec2/aws	0.2.1
telemetry	snowplow-devops/telemetry/snowplow	0.5.0

Resources

Name	Type
aws_cloudwatch_log_group.log_group	resource
aws_dynamodb_table.kcl	resource
aws_iam_instance_profile.instance_profile	resource
aws_iam_policy.iam_policy	resource
aws_iam_role.iam_role	resource
aws_iam_role_policy_attachment.policy_attachment	resource
aws_security_group.sg	resource
aws_security_group_rule.egress_tcp_443	resource
aws_security_group_rule.egress_tcp_80	resource
aws_security_group_rule.egress_udp_123	resource
aws_security_group_rule.ingress_tcp_22	resource
aws_caller_identity.current	data source
aws_region.current	data source

Inputs

Name	Description	Type	Default	Required
name	A name which will be pre-pended to the resources created	`string`	n/a	yes
s3_bucket_name	The name of the S3 bucket events will be loaded into	`string`	n/a	yes
s3_bucket_object_prefix	An optional prefix under which Snowplow data will be saved	`string`	n/a	yes
ssh_key_name	The name of the SSH key-pair to attach to all EC2 nodes deployed	`string`	n/a	yes
stream_name	The name of the input kinesis stream that the Transformer will pull data from	`string`	n/a	yes
subnet_ids	The list of subnets to deploy Transformer across	`list(string)`	n/a	yes
vpc_id	The VPC to deploy Transformer within	`string`	n/a	yes
window_period_min	Frequency to emit loading finished message - 5,10,15,20,30,60 etc minutes	`number`	n/a	yes
accept_limited_use_license	Acceptance of the SLULA terms (https://docs.snowplow.io/limited-use-license-1.0/)	`bool`	`false`	no
amazon_linux_2_ami_id	The AMI ID to use which must be based of of Amazon Linux 2; by default the latest community version is used	`string`	`""`	no
app_version	Version of transformer kinesis	`string`	`"5.6.0"`	no
associate_public_ip_address	Whether to assign a public ip address to this instance	`bool`	`true`	no
cloudwatch_logs_enabled	Whether application logs should be reported to CloudWatch	`bool`	`true`	no
cloudwatch_logs_retention_days	The length of time in days to retain logs for	`number`	`7`	no
config_override_b64	App config uploaded as a base64 encoded blob. This variable facilitates dev flow, if config is incorrect this can break the deployment.	`string`	`""`	no
custom_iglu_resolvers	The custom Iglu Resolvers that will be used by Transformer	list(object({ name = string priority = number uri = string api_key = string vendor_prefixes = list(string) }))	`[]`	no
default_iglu_resolvers	The default Iglu Resolvers that will be used by Transformer	list(object({ name = string priority = number uri = string api_key = string vendor_prefixes = list(string) }))	[ { "api_key": "", "name": "Iglu Central", "priority": 10, "uri": "http://iglucentral.com", "vendor_prefixes": [] }, { "api_key": "", "name": "Iglu Central - Mirror 01", "priority": 20, "uri": "http://mirror01.iglucentral.com", "vendor_prefixes": [] } ]	no
default_shred_format	Format used by default when format type is 'shred' (TSV or JSON)	`string`	`"TSV"`	no
iam_permissions_boundary	The permissions boundary ARN to set on IAM roles created	`string`	`""`	no
initial_position	Where to start processing the input Kinesis Stream from (TRIM_HORIZON or LATEST)	`string`	`"TRIM_HORIZON"`	no
instance_type	The instance type to use	`string`	`"t3a.small"`	no
java_opts	Custom JAVA Options	`string`	`"-XX:InitialRAMPercentage=75 -XX:MaxRAMPercentage=75"`	no
kcl_read_max_capacity	The maximum READ capacity for the KCL DynamoDB table	`number`	`10`	no
kcl_read_min_capacity	The minimum READ capacity for the KCL DynamoDB table	`number`	`1`	no
kcl_write_max_capacity	The maximum WRITE capacity for the KCL DynamoDB table	`number`	`10`	no
kcl_write_min_capacity	The minimum WRITE capacity for the KCL DynamoDB table	`number`	`1`	no
private_ecr_registry	The URL of an ECR registry that the sub-account has access to (e.g. '000000000000.dkr.ecr.cn-north-1.amazonaws.com.cn/')	`string`	`""`	no
schemas_json	List of schemas to get shredded as JSON	`list(string)`	`[]`	no
schemas_skip	List of schemas to not get shredded (and thus not loaded)	`list(string)`	`[]`	no
schemas_tsv	List of schemas to get shredded as TSV	`list(string)`	`[]`	no
sns_topic_arn	The ARN of the SNS topic that Transformer will send the transforming complete message. Either `sqs_queue_name` or `sns_topic_arn` needs to be set	`string`	`""`	no
sqs_queue_name	The name of the SQS queue that Transformer will send the transforming complete message. Either `sqs_queue_name` or `sns_topic_arn` needs to be set	`string`	`""`	no
ssh_ip_allowlist	The list of CIDR ranges to allow SSH traffic from	`list(any)`	[ "0.0.0.0/0" ]	no
tags	The tags to append to this resource	`map(string)`	`{}`	no
telemetry_enabled	Whether or not to send telemetry information back to Snowplow Analytics Ltd	`bool`	`true`	no
transformation_type	Type of the transformation (shred or widerow)	`string`	`"shred"`	no
transformer_compression	Transformer output compression, GZIP or NONE	`string`	`"GZIP"`	no
user_provided_id	An optional unique identifier to identify the telemetry events emitted by this stack	`string`	`""`	no
widerow_file_format	The output file_format from the widerow transformation_type selected (json or parquet)	`string`	`"json"`	no

Outputs

Name	Description
asg_id	ID of the ASG
asg_name	Name of the ASG
sg_id	ID of the security group attached to the Transformer servers

Copyright and license

Licensed under the Snowplow Limited Use License Agreement. (If you are uncertain how it applies to your use case, check our answers to frequently asked questions.)

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.github/workflows		.github/workflows
templates		templates
.gitignore		.gitignore
CHANGELOG		CHANGELOG
LICENSE		LICENSE
README.md		README.md
main.tf		main.tf
outputs.tf		outputs.tf
variables.tf		variables.tf
versions.tf		versions.tf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

terraform-aws-transformer-kinesis-ec2

Telemetry

How do I disable it?

What are you collecting?

Usage

Standard usage

Requirements

Providers

Modules

Resources

Inputs

Outputs

Copyright and license

About

Releases 16

Packages

Contributors 4

Languages

License

snowplow-devops/terraform-aws-transformer-kinesis-ec2

Folders and files

Latest commit

History

Repository files navigation

terraform-aws-transformer-kinesis-ec2

Telemetry

How do I disable it?

What are you collecting?

Usage

Standard usage

Requirements

Providers

Modules

Resources

Inputs

Outputs

Copyright and license

About

Resources

License

Stars

Watchers

Forks

Releases 16

Packages 0

Contributors 4

Languages

Packages