diff --git a/metaphor/kafka/README.md b/metaphor/kafka/README.md index 6577a0e7..09c31b00 100644 --- a/metaphor/kafka/README.md +++ b/metaphor/kafka/README.md @@ -4,28 +4,7 @@ This connector extracts technical metadata from Kafka using [Confluent's Python ## Setup -To run a Kafka cluster locally, follow the instructions below: - -1. Start a Kafka cluster (broker + schema registry + REST proxy) locally via docker-compose: -```shell -$ docker-compose --file metaphor/kafka/docker-compose.yml up -d -``` - - Broker is on port 9092. - - Schema registry is on port 8081. - - REST proxy is on port 8082. -2. Find the cluster ID: -```shell -$ curl -X GET --silent http://localhost:8082/v3/clusters/ | jq '.data[].cluster_id' -``` -3. Register a new topic via the REST proxy: -```shell -curl -X POST -H "Content-Type: application/json" http://localhost:8082/v3/clusters//topics -d '{"topic_name": ""}'| jq . -``` -4. Register a schema to the registry: -```shell -curl -X POST -H "Content-Type: application/vnd.schemaregistry.v1+json" --data '{"schema": }' http://localhost:8081/subjects/-/version -``` - - It is possible to have schema with name different to the topic. See `Topic <-> Schema Subject Mapping` section below for more info. +If [ACL](https://docs.confluent.io/platform/current/security/rbac/authorization-acl-with-mds.html) is enabled, the credentials used by the crawler must be allowed to perform [Describe operation](https://docs.confluent.io/platform/current/kafka/authorization.html#topic-resource-type-operations) on the topics of interest. ## Config File @@ -33,47 +12,47 @@ Create a YAML config file based on the following template. ### Required Configurations -You must specify at least one bootstrap server, i.e. a pair of host and port pointing to the Kafka broker instance. You must also specify a URL for the schema registry. +You must specify at least one bootstrap server, i.e. a pair of host and port pointing to a Kafka broker instance. You must also specify a URL for the schema registry. ```yaml bootstrap_servers: - host: port: -schema_registry_url: # Schema Registry URL. Schema registry client supports URL with basic HTTP authentication values, i.e. `http://username:password@host:port`. +schema_registry_url: output: file: directory: ``` +To use HTTP basic authentication for the schema registry, specify the credentials in `schema_regitry_url` using the format `https://:@host:port`. + See [Output Config](../common/docs/output.md) for more information on `output`. ### Optional Configurations -#### Kafka Admin Client +#### SASL Authentication -##### Common SASL Authentication Configurations - -Some most commonly used SASL authentication configurations have their own section: +You can optionally authenticate against the brokers by adding the following SASL configurations: ```yaml sasl_config: - username: # SASL username for use with the `PLAIN` and `SASL-SCRAM-..` mechanisms. - password: # SASL password for use with the `PLAIN` and `SASL-SCRAM-..` mechanisms. - mechanism: # SASL mechanism to use for authentication. Supported: `GSSAPI`, `PLAIN`, `SCRAM-SHA-256`, `SCRAM-SHA-512`, `OAUTHBEARER`. Default: `GSSAPI`. + # SASL mechanism, e.g. GSSAPI, PLAIN, SCRAM-SHA-256, etc. + mechanism: + + # SASL username & password for PLAIN, SCRAM-* mechanisms + username: + password: ``` -##### Other Configurations - -For other configurable values, please use `extra_admin_client_config` field: +Some mechanisms (e.g., `kerberos` & `oauthbearer`) require additional configs that can be specified using `extra_admin_client_config`: ```yaml extra_admin_client_config: sasl.kerberos.service.name: "kafka" sasl.kerberos.principal: "kafkaclient" - ... ``` -Visit [https://github.com/confluentinc/librdkafka/blob/master/CONFIGURATION.md](https://github.com/confluentinc/librdkafka/blob/master/CONFIGURATION.md) for full list of available Kafka client configurations. +See [https://github.com/confluentinc/librdkafka/blob/master/CONFIGURATION.md](https://github.com/confluentinc/librdkafka/blob/master/CONFIGURATION.md) for a complete list. #### Filtering @@ -90,81 +69,80 @@ By default the following topics are excluded: - `_schema` - `__consumer_offsets` -#### Topic <-> Schema Subject Mapping +#### Topic to Schema Subject Mapping -Kafka messages are sent as key / value pairs, and both can have their schemas defined in the schema registry. There are three strategies to map topic to schema subjects: +Kafka messages can have key and value schemas defined in the schema registry. There are three strategies to map topics to schema subjects from the schema registry: -##### Strategies +| Subject Name Strategy | Key Schema Subject | Value Schema Subject | +| :------------------------------------- | :----------------- | :------------------- | +| `TOPIC_RECORD_NAME_STRATEGY` (Default) | `-key` | `-value` | +| `RECORD_NAME_STRATEGY` | `*-key` | `*-value` | +| `TOPIC_RECORD_NAME_STRATEGY` | `-*-key` | `-*-value` | -###### Topic Name Strategy (Default) +where `` is the topic name, and `*` matches either all strings or a set of values specified in the config. -For a topic `foo`, the subjects for the schemas for the messages sent through this topic would be `foo-key` (the key schema subject) and `foo-value` (the value schema subject). +##### Example: TOPIC_RECORD_NAME_STRATEGY -###### Record Name Strategy +The following is the default config, which assumes all messages for a topic `topic` have `topic-key` key schema and `topic-value` value schema. -It is possible for a topic to have more than one schema. In that case this strartegy can be useful. To enable this as default, add the following in the configuration file: +```yaml +default_subject_name_strategy: TOPIC_RECORD_NAME_STRATEGY +``` + +##### Example: RECORD_NAME_STRATEGY + +The following config specificities that topic `topic` to have two types of key-value schemas, `(type1-key, type1-value)` and `(type2-key, type2-value)`: ```yaml default_subject_name_strategy: RECORD_NAME_STRATEGY topic_naming_strategies: - foo: + topic: records: - - bar - - baz + - type1 + - type2 ``` -This means topic `foo` can transmit the following schemas: - -- `bar-key` -- `bar-value` -- `baz-key` -- `baz-value` - -###### Topic Record Name Strategy +##### Example: TOPIC_RECORD_NAME_STRATEGY -This strategy is best demonstrated through an example: +This is similar to `RECORD_NAME_STRATEGY`, except the schema subjects are prefixed with the topic name. For example, the following specifies that the topic `topic` to have two types of key-value schemas, `(topic-type1-key, topic-type1-value)` and `(topic-type2-key, topic-type2-value)` ```yaml default_subject_name_strategy: TOPIC_RECORD_NAME_STRATEGY topic_naming_strategies: - foo: + topic: records: - - bar - - baz - quax: - records: [] # If list of record names is empty, we take all subjects that starts with "-" and ends with "-" as topic subjects. + - type1 + - type2 ``` -- For topic `foo`, the subjects it transmits are - - `foo-bar-key` - - `foo-bar-value` - - `foo-baz-key` - - `foo-baz-value` -- For topic `quax`, all subject that starts with `quax-` and ends with either `-key` or `-value` is considered a subject on topic `quax`. +Instead of explicitly enumerating the type values, you can specify an empty list to match all possible values, i.e. `(topic-*-key, topic-*-value)`: -##### Overriding Subject Name Strategy for Specific Topics +```yaml +default_subject_name_strategy: TOPIC_RECORD_NAME_STRATEGY +topic_naming_strategies: + tpoic: + records: [] +``` + +##### Example: Overriding Strategy for Specific Topics -It is possible to override subject name strategy for specific topics: +It is possible to override the subject name strategy for specific topics, e.g. ```yaml default_subject_name_strategy: RECORD_NAME_STRATEGY topic_naming_strategies: - foo: + topic1: records: - - bar - - baz - quax: + - type1 + - type2 + topic2: override_subject_name_strategy: TOPIC_NAME_STRATEGY ``` -- The following subjects are transmitted through topic `foo`: - - `bar-key` - - `bar-value` - - `baz-key` - - `baz-value` -- For topic `quax`, since it uses `TOPIC_NAME_STRATEGY`, connector will look for the following 2 subjects: - - `quax-key` - - `quax-value` +The results in the following schemas + +- `topic1`: `(type1-key, type1-value)`, `(type2-key, type2-value)` +- `topic2`: `(topic2-key, topic2-value)` ## Testing @@ -176,4 +154,4 @@ To test the connector locally, change the config file to output to a local path metaphor kafka ``` -Manually verify the output after the run finishes. \ No newline at end of file +Manually verify the output after the run finishes.