This application is meant to be used as a backup tool for Kafka. It comes with connectors to read and write data from Kafka to different storage systems and provides a CLI for manual backup and restore operations.
- Backup Kafka topics to any S3 compatible storage system (e.g. AWS S3, Minio, etc.)
- Multiple formats for backup files (e.g. Binary, Avro, JSON, etc.) supported
- Backup and restore Kafka topics using the
kbackup
cli - Repair corrupted Kafka log files using the
kbackup
cli - Prometheus metrics for Kafka S3 Backup Connector
- Docker
- Kafka
- S3 compatible storage system (e.g. AWS S3, Minio, etc.)
By default (no CMD or ENTRYPOINT defined) the container starts the Kafka S3 Backup Connector. This connector uses the Kafka Connect S3 Connector to backup Kafka topics to S3. It uses the Kafka Connect API to stream data from Kafka.
The connector is created automatically after the Kafka-Connect worker has started. See the Configuration Properties for a list of all available properties.
The connector can be configured using a configuration file. See localdev
for an example configuration file.
Metrics are exported using the Prometheus JMX Exporter and is
configured to expose metrics in Prometheus format at /metrics
.
The following environment variables can be used to configure the exporter:
METRICS_ENABLED
: Enable or disable the metrics exporter. Default:true
METRICS_PORT
: The port the metrics are exposed on. Default:9876
The docker image also contains a CLI for manual restore operations. The CLI can be run using the kbackup
command. kbackup
generally assumes that topics it writes to either exist or are auto created. It doesn't create any
topics.
The cli can be run using gradle:
./gradlew :cli:run --args="kbackup <subcommand> <args>..."
or on container startup with docker:
docker run --network host lovelysystems/lovely-kafka-backup:dev kbackup <subcommand> <args>...
NOTE: network host is required if kafka is also running locally in a container
or kubectl:
kubectl run my-cli-container --rm -i --image lovelysystems/lovely-kafka-backup:dev "kbackup <subcommand> <args>..."
or within a standalone container:
docker run --rm -it --entrypoint /bin/bash lovelysystems/lovely-kafka-backup:dev
kbackup <subcommand>
or from within a running container:
$ docker exec -it <container> /bin/bash
kbackup <subcommand>
To restore records from a backup run the program. The restore reads backed up records from s3 and appends them to the target topics. Offsets of the records are not restored.
kbackup restore --bucket user-devices --s3Endpoint http://localhost:9000 --bootstrapServers localhost:9092
This command restores all the backed up records in the bucket user-devices
on S3 hosted at http://localhost:9000
to their original topics.
Option name | Short option | Required | Format | Description |
---|---|---|---|---|
bucket | b | always | String | Bucket in which the backup is stored |
prefix | p | String | Limit S3 objects that begin with the specified prefix. | |
key | k | Regex | Allows to filter S3 objects by the specified key pattern. | |
partition | Number | Partition of the source topics to restore. | ||
from | Number | Start Kafka offset of objects to restore. If not set, records from earliest available offset are restored. | ||
to | Number | End Kafka offset of records to restore (exclusive). If not set, records to latest available are restored. | ||
s3Endpoint | If not restoring from AWS | Url | Endpoint for S3 backup storage | |
bootstrapServers | If env KAFKA_BOOTSTRAP_SERVERS is not set |
(list of) Urls | Kafka cluster to restore the backup to | |
topic | t | String | The target topic where records are restored to. Otherwise topics get restored into their original topic name | |
profile | String | Profile to user for S3 access. If not set uses AWS_PROFILE environment variable or the default profile. |
S3 can be configured using the environment variables AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
or by setting
a profile in the parameter --profile
. Profile takes priority.
NOTE: configuration via profile is mostly useful for development and running the cli via gradle :cli:run
. To use the
it in docker
the config file from ~/.aws
would need to be mounted into the container.
Additional configs for kafka can be set via environment variables prefixed with KAFKA_
. If an argument is passed the
argument takes priority.
Repairs corrupted records from with records from backup:
kbackup repair --bucket s3-backup --data-directory kafka-data
This calls checks the kafka data in kafka-data
and repairs them with backed up records in s3-backup
if there are any corrupted.
Option name | Short | Required | Description |
---|---|---|---|
bucket | b | always | Backup bucket to repair from |
data-directory | always | Data directory to which contains the kafka data | |
filter | f | glob pattern to filter log directories to repair. Defaults to all | |
skipBroken | s | Flag. If set records which aren't in the backup are skipped | |
repair | r | Flag. If set the files are repaired otherwise just listed | |
s3Endpoint | Url to S3. If not given defaults to AWS-S3 | ||
profile | The profile to use for s3 access |
S3 can be configured using the environment variables AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
or by setting
a profile in the parameter --profile
. Profile takes priority.
NOTE: configuration via profile is mostly useful for development and running the cli via gradle :cli:run
. To use the it in docker
the config file from ~/.aws
would need to be mounted into the container.
Tests can be run using Gradle:
./gradlew test
Use Gradle to build and publish a new version of the docker image. The version is read from the CHANGES.md
file.
./gradlew createTag
./gradlew pushDockerImage
Publish the lovely-kafka-format
library to Github Packages:
./gradlew publish