- If you read Kinesis Data Streams, think Kafka topics
- Producers can be apps (SDK), clients (written using Kinesis Producer Library, KPL) Kinesis agents
- Consumers can be apps (SDK), clients (written using Kinesis Consumer Library, KCL, Lambda functions, Kinesis Data Firehose or Kinesis Data Analytics
- Records sent to data streams contain partition key and data blob
- Partition key determines to which shard data is written
- Retention can be between 1 and 365 days
- Kinesis Data Streams can be created in provisioned (define shards upfront and pay per shard per hour) or demand (automatic scaling - based on throughput of last 30 days - payed per stream per hour & data in/output per GB) mode
- If you capacity in advance, go for provisioned mode
- Kinesis Data Streams support VPC endpoints
- Use SDK for low throughput, high latency
- SDK exposes PutRecord (single record) and PutRecords (batched records) method
- ProvisionedThroughPutExceed exception is thrown when data/records per second exceeds threshold of shard
- Choose your partition key wisely
- Java/C++ library
- Use KPL to build high throughput, long-running producers
- Supports sync and async API -> If you read sending data to Kinesis asynchronously, think KPL
- Submits metrics to Cloudwatch
- Supports batch via collect (write to multiple shards in one PutRecords API call) and aggregate (nest multiple records in a single record)
- Compression is not supported out of the box
- KPL created records can only be read with the KCL library
- Don't use KPL if latency is important or if only latest events are of interest
- built on top of KPL
- watches files/directories and can send to multipe streams
- can also preprocess and convert data before sending it
- Supports file rotation, checkpointing and retries