Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KafkaSinkConnector #10

Open
wants to merge 51 commits into
base: KafkaV2SourceConnector-2
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 19 commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
98f3d08
some change
Feb 24, 2024
8838edd
add sink connector v2 implementation
Feb 27, 2024
1c7d9b5
changes
Feb 27, 2024
ac1a96f
update pom
Feb 27, 2024
c284b1f
Merge branch 'main' into kafkaV2SinkConnector-2
Feb 27, 2024
b1bc7ee
change to use customized schedulers
Feb 27, 2024
223d6d7
pom file update
Mar 1, 2024
a5116b2
update pom file
Mar 1, 2024
490d550
fix
Mar 1, 2024
32fc004
merge from main and resolve conflicts
Mar 1, 2024
9cc76f2
Merge branch 'main' into kafkaV2SinkConnector-2
Mar 12, 2024
1c5b5f0
resolve comments
Mar 12, 2024
faf25ed
add e2e test for sink connector
Mar 12, 2024
adc0235
fix test
Mar 12, 2024
3489467
remove kafka-integration test suites
Mar 12, 2024
2866916
upgrade bom version (#39412)
mssfang Mar 26, 2024
f512c9c
Increment versions for cosmos releases (#39411)
azure-sdk Mar 26, 2024
10d323b
Autorest Regen to update scope to include non-public clouds (#39224)
jairmyree Mar 26, 2024
056e0d9
Azure Monitor Query: Prepare for GA release (#39413)
srnagar Mar 27, 2024
3fae54e
[Automation] Generate Fluent Lite from sphere#package-2024-04-01 (#39…
azure-sdk Mar 27, 2024
15f7c58
Increment package versions for monitor releases (#39419)
azure-sdk Mar 27, 2024
b8a9d8c
Tracing: end HTTP spans when body is fully consumed, add missing meth…
lmolkova Mar 27, 2024
e6e6f21
Increment package versions for sphere releases (#39421)
azure-sdk Mar 27, 2024
00db29e
[Automation] Generate Fluent Lite from batch#package-2024-02 (#39424)
azure-sdk Mar 27, 2024
38cf774
Fix DeleteOptions on Public Ip Address (issue#38806) (#39096)
v-hongli1 Mar 27, 2024
7176df0
Fix Azure Remote Rendering tests by adding now missing dependencies a…
MichaelZp0 Mar 27, 2024
3c31ba3
Increment package versions for batch releases (#39426)
azure-sdk Mar 27, 2024
c8b9853
Update Azure AD Learn links in Identity (#39405)
scottaddie Mar 27, 2024
483e43c
Fix Tables Samples Issue (#38952)
faynef Mar 27, 2024
a6d7723
Sync eng/common directory with azure-sdk-tools for PR 7855 (#39324)
azure-sdk Mar 27, 2024
7eb17b0
Updated Network failure logging to warning (#39429)
kushagraThapar Mar 27, 2024
c5786ec
merge from original PR
Mar 27, 2024
7be04be
merge from source connector
Mar 27, 2024
9410009
revert unnecessary changes
Mar 27, 2024
ac4917a
Remove checked IOException from ReadValueCallback (#39431)
alzimmermsft Mar 27, 2024
029ceb7
Make azure-json transitive in azure-core module-info (#39432)
alzimmermsft Mar 27, 2024
5c8ca26
merge from latest source connector
Mar 27, 2024
f705170
fix compile
Mar 28, 2024
55ccc93
[Core-JDK-Vertx-Matrix] Added JDK and Vertx Http Clients dependencies…
mssfang Mar 28, 2024
2c2a540
Fix pr#36447 mgmt, fix samples for appservice (#39423)
v-hongli1 Mar 28, 2024
695f020
Fixed mgmt, support convenience API for publicNetworkAccess (#39357)
v-hongli1 Mar 28, 2024
688298d
fix (#39417)
ibrandes Mar 28, 2024
779ff89
Merge branch 'KafkaV2SourceConnector-2' into kafkaV2SinkConnector-3
Mar 28, 2024
5466792
Add Job id to fix sdl artifact name conflict (#39453)
weshaggard Mar 28, 2024
7ddd91a
Add missing call to get window handle to sample (#39460)
billwert Mar 28, 2024
f96205e
Merge to main after spring cloud azure 4.17.0 released (#39448)
Netyyyy Mar 28, 2024
bf1dad9
Stabilizing the WindowedSubscriber test that uses VTS time advancing,…
anuchandy Mar 29, 2024
e34af58
KafakV2SourceConnector (#39410)
xinlian12 Mar 29, 2024
41d49eb
merge from main and resolve conflicts'
Mar 29, 2024
4b486fd
refactor
Mar 29, 2024
8f819f9
resolve comments
Mar 29, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -319,6 +319,7 @@ the main ServiceBusClientBuilder. -->
<suppress checks="com.azure.tools.checkstyle.checks.GoodLoggingCheck"
files="[/\\]azure-cosmos-kafka-connect[/\\]"/>
<suppress checks="com.azure.tools.checkstyle.checks.ExternalDependencyExposedCheck" files="com.azure.cosmos.kafka.connect.CosmosDBSourceConnector"/>
<suppress checks="com.azure.tools.checkstyle.checks.ExternalDependencyExposedCheck" files="com.azure.cosmos.kafka.connect.CosmosDBSinkConnector"/>

<!-- Checkstyle suppressions for resource manager package -->
<suppress checks="com.azure.tools.checkstyle.checks.ServiceClientCheck" files="com.azure.resourcemanager.*"/>
Expand Down
1 change: 1 addition & 0 deletions eng/versioning/external_dependencies.txt
Original file line number Diff line number Diff line change
Expand Up @@ -394,6 +394,7 @@ cosmos_org.scalastyle:scalastyle-maven-plugin;1.0.0
## Cosmos Kafka connector under sdk\cosmos\azure-cosmos-kafka-connect\pom.xml
# Cosmos Kafka connector runtime dependencies
cosmos_org.apache.kafka:connect-api;3.6.0
cosmos_com.jayway.jsonpath:json-path;2.9.0
# Cosmos Kafka connector tests only
cosmos_org.apache.kafka:connect-runtime;3.6.0
cosmos_org.testcontainers:testcontainers;1.19.5
Expand Down
2 changes: 1 addition & 1 deletion sdk/cosmos/azure-cosmos-kafka-connect/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
### 1.0.0-beta.1 (Unreleased)

#### Features Added
* Added Source connector. See [PR 39410](https://github.com/Azure/azure-sdk-for-java/pull/39410)
* Added Source connector. See [PR 39410](https://github.com/Azure/azure-sdk-for-java/pull/39410)

#### Breaking Changes

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,3 +23,16 @@
| `kafka.connect.cosmos.source.metadata.storage.topic` | `_cosmos.metadata.topic` | The name of the topic where the metadata are stored. The metadata topic will be created if it does not already exist, else it will use the pre-created topic. |
| `kafka.connect.cosmos.source.messageKey.enabled` | `true` | Whether to set the kafka record message key. |
| `kafka.connect.cosmos.source.messageKey.field` | `id` | The field to use as the message key. |

## Sink Connector Configuration
| Config Property Name | Default | Description |
|:---------------------------------------------------------------|:--------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `kafka.connect.cosmos.sink.database.name` | None | Cosmos DB database name. |
| `kafka.connect.cosmos.sink.containers.topicMap` | None | A comma delimited list of Kafka topics mapped to Cosmos containers. For example: topic1#con1,topic2#con2. |
| `kafka.connect.cosmos.sink.errors.tolerance` | `None` | Error tolerance level after exhausting all retries. `None` for fail on error. `All` for log and continue |
| `kafka.connect.cosmos.sink.bulk.enabled` | `true` | Flag to indicate whether Cosmos DB bulk mode is enabled for Sink connector. By default it is true. |
| `kafka.connect.cosmos.sink.bulk.maxConcurrentCosmosPartitions` | `-1` | Cosmos DB Item Write Max Concurrent Cosmos Partitions. If not specified it will be determined based on the number of the container's physical partitions which would indicate every batch is expected to have data from all Cosmos physical partitions. If specified it indicates from at most how many Cosmos Physical Partitions each batch contains data. So this config can be used to make bulk processing more efficient when input data in each batch has been repartitioned to balance to how many Cosmos partitions each batch needs to write. This is mainly useful for very large containers (with hundreds of physical partitions. |
| `kafka.connect.cosmos.sink.bulk.initialBatchSize` | `1` | Cosmos DB initial bulk micro batch size - a micro batch will be flushed to the backend when the number of documents enqueued exceeds this size - or the target payload size is met. The micro batch size is getting automatically tuned based on the throttling rate. By default the initial micro batch size is 1. Reduce this when you want to avoid that the first few requests consume too many RUs. |
| `kafka.connect.cosmos.sink.write.strategy` | `ItemOverwrite` | Cosmos DB Item write Strategy: `ItemOverwrite` (using upsert), `ItemAppend` (using create, ignore pre-existing items i.e., Conflicts), `ItemDelete` (deletes based on id/pk of data frame), `ItemDeleteIfNotModified` (deletes based on id/pk of data frame if etag hasn't changed since collecting id/pk), `ItemOverwriteIfNotModified` (using create if etag is empty, update/replace with etag pre-condition otherwise, if document was updated the pre-condition failure is ignored) |
| `kafka.connect.cosmos.sink.maxRetryCount` | `10` | Cosmos DB max retry attempts on write failures for Sink connector. By default, the connector will retry on transient write errors for up to 10 times. |
| `kafka.connect.cosmos.sink.id.strategy` | `ProvidedInValueStrategy` | A strategy used to populate the document with an ``id``. Valid strategies are: ``TemplateStrategy``, ``FullKeyStrategy``, ``KafkaMetadataStrategy``, ``ProvidedInKeyStrategy``, ``ProvidedInValueStrategy``. Configuration properties prefixed with``id.strategy`` are passed through to the strategy. For example, when using ``id.strategy=TemplateStrategy`` , the property ``id.strategy.template`` is passed through to the template strategy and used to specify the template string to be used in constructing the ``id``. |
25 changes: 24 additions & 1 deletion sdk/cosmos/azure-cosmos-kafka-connect/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -47,8 +47,14 @@ Licensed under the MIT License.
--add-opens com.azure.cosmos.kafka.connect/com.azure.cosmos.kafka.connect=ALL-UNNAMED
--add-opens com.azure.cosmos.kafka.connect/com.azure.cosmos.kafka.connect.implementation=ALL-UNNAMED
--add-opens com.azure.cosmos.kafka.connect/com.azure.cosmos.kafka.connect.implementation.source=com.fasterxml.jackson.databind,ALL-UNNAMED
--add-opens com.azure.cosmos.kafka.connect/com.azure.cosmos.kafka.connect.implementation.sink=ALL-UNNAMED
--add-opens com.azure.cosmos.kafka.connect/com.azure.cosmos.kafka.connect.implementation.sink.idStrategy=ALL-UNNAMED
--add-opens com.azure.cosmos/com.azure.cosmos.implementation=ALL-UNNAMED
--add-opens com.azure.cosmos/com.azure.cosmos.implementation.routing=ALL-UNNAMED
--add-opens com.azure.cosmos/com.azure.cosmos.implementation.caches=ALL-UNNAMED
--add-opens com.azure.cosmos/com.azure.cosmos.implementation.faultinjection=ALL-UNNAMED
--add-opens com.azure.cosmos/com.azure.cosmos.implementation.apachecommons.lang=ALL-UNNAMED

--add-exports com.azure.cosmos/com.azure.cosmos.implementation.changefeed.common=com.azure.cosmos.kafka.connect
--add-exports com.azure.cosmos/com.azure.cosmos.implementation.feedranges=com.azure.cosmos.kafka.connect
--add-exports com.azure.cosmos/com.azure.cosmos.implementation.query=com.azure.cosmos.kafka.connect
Expand Down Expand Up @@ -83,6 +89,13 @@ Licensed under the MIT License.
<scope>provided</scope>
</dependency>

<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-cosmos-test</artifactId>
<version>1.0.0-beta.7</version> <!-- {x-version-update;com.azure:azure-cosmos-test;current} -->
<scope>test</scope>
</dependency>

<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-collections4</artifactId>
Expand All @@ -96,6 +109,11 @@ Licensed under the MIT License.
<scope>test</scope>
<version>1.10.0</version> <!-- {x-version-update;org.apache.commons:commons-text;external_dependency} -->
</dependency>
<dependency>
<groupId>com.jayway.jsonpath</groupId>
<artifactId>json-path</artifactId>
<version>2.9.0</version> <!-- {x-version-update;cosmos_com.jayway.jsonpath:json-path;external_dependency} -->
</dependency>

<dependency>
<groupId>org.apache.kafka</groupId>
Expand Down Expand Up @@ -238,6 +256,7 @@ Licensed under the MIT License.
<include>com.azure:*</include>
<include>org.apache.kafka:connect-api:[3.6.0]</include> <!-- {x-include-update;cosmos_org.apache.kafka:connect-api;external_dependency} -->
<include>io.confluent:kafka-connect-maven-plugin:[0.12.0]</include> <!-- {x-include-update;cosmos_io.confluent:kafka-connect-maven-plugin;external_dependency} -->
<include>com.jayway.jsonpath:json-path:[2.9.0]</include> <!-- {x-include-update;cosmos_com.jayway.jsonpath:json-path;external_dependency} -->
<include>org.sourcelab:kafka-connect-client:[4.0.4]</include> <!-- {x-include-update;cosmos_org.sourcelab:kafka-connect-client;external_dependency} -->
</includes>
</bannedDependencies>
Expand Down Expand Up @@ -322,6 +341,10 @@ Licensed under the MIT License.
<pattern>reactor</pattern>
<shadedPattern>${shadingPrefix}.reactor</shadedPattern>
</relocation>
<relocation>
<pattern>com.jayway.jsonpath</pattern>
<shadedPattern>${shadingPrefix}.com.jayway.jsonpath</shadedPattern>
</relocation>
</relocations>
<artifactSet>
<excludes>
Expand Down Expand Up @@ -459,7 +482,7 @@ Licensed under the MIT License.
</profile>
<profile>
<!-- integration tests, requires Cosmos DB Emulator Endpoint -->
<id>kafka-integration</id>
<id>kafka</id>
<properties>
<test.groups>kafka</test.groups>
</properties>
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
// Copyright (c) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License.

package com.azure.cosmos.kafka.connect;

import com.azure.cosmos.kafka.connect.implementation.KafkaCosmosConstants;
import com.azure.cosmos.kafka.connect.implementation.sink.CosmosSinkConfig;
import com.azure.cosmos.kafka.connect.implementation.sink.CosmosSinkTask;
import org.apache.kafka.common.config.ConfigDef;
import org.apache.kafka.connect.connector.Task;
import org.apache.kafka.connect.sink.SinkConnector;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.util.ArrayList;
import java.util.List;
import java.util.Map;

/**
* A Sink connector that publishes topic messages to CosmosDB.
*/
public class CosmosDBSinkConnector extends SinkConnector {
private static final Logger LOGGER = LoggerFactory.getLogger(CosmosDBSinkConnector.class);

private CosmosSinkConfig sinkConfig;

@Override
public void start(Map<String, String> props) {
LOGGER.info("Starting the kafka cosmos sink connector");
this.sinkConfig = new CosmosSinkConfig(props);
}

@Override
public Class<? extends Task> taskClass() {
return CosmosSinkTask.class;
}

@Override
public List<Map<String, String>> taskConfigs(int maxTasks) {
LOGGER.info("Setting task configurations with maxTasks {}", maxTasks);
List<Map<String, String>> configs = new ArrayList<>();
for (int i = 0; i < maxTasks; i++) {
configs.add(this.sinkConfig.originalsStrings());
}

return configs;
}

@Override
public void stop() {
LOGGER.debug("Kafka Cosmos sink connector {} is stopped.");
}

@Override
public ConfigDef config() {
return CosmosSinkConfig.getConfigDef();
}

@Override
public String version() {
return KafkaCosmosConstants.CURRENT_VERSION;
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@
import com.azure.cosmos.implementation.Utils;
import com.azure.cosmos.implementation.apachecommons.lang.tuple.Pair;
import com.azure.cosmos.kafka.connect.implementation.CosmosClientStore;
import com.azure.cosmos.kafka.connect.implementation.CosmosConstants;
import com.azure.cosmos.kafka.connect.implementation.CosmosExceptionsHelper;
import com.azure.cosmos.kafka.connect.implementation.KafkaCosmosConstants;
import com.azure.cosmos.kafka.connect.implementation.KafkaCosmosExceptionsHelper;
import com.azure.cosmos.kafka.connect.implementation.source.CosmosSourceConfig;
import com.azure.cosmos.kafka.connect.implementation.source.CosmosSourceOffsetStorageReader;
import com.azure.cosmos.kafka.connect.implementation.source.CosmosSourceTask;
Expand Down Expand Up @@ -103,8 +103,8 @@ public ConfigDef config() {

@Override
public String version() {
return CosmosConstants.CURRENT_VERSION;
} // TODO[public preview]: how this is being used
return KafkaCosmosConstants.CURRENT_VERSION;
}

private List<Map<String, String>> getTaskConfigs(int maxTasks) {
Pair<MetadataTaskUnit, List<FeedRangeTaskUnit>> taskUnits = this.getAllTaskUnits();
Expand Down Expand Up @@ -322,22 +322,14 @@ private List<FeedRange> getFeedRanges(CosmosContainerProperties containerPropert
.getContainer(containerProperties.getId())
.getFeedRanges()
.onErrorMap(throwable ->
CosmosExceptionsHelper.convertToConnectException(
KafkaCosmosExceptionsHelper.convertToConnectException(
throwable,
"GetFeedRanges failed for container " + containerProperties.getId()))
.block();
}

private Map<String, String> getContainersTopicMap(List<CosmosContainerProperties> allContainers) {
Map<String, String> topicMapFromConfig =
this.config.getContainersConfig().getContainersTopicMap()
.stream()
.map(containerTopicMapString -> containerTopicMapString.split("#"))
.collect(
Collectors.toMap(
containerTopicMapArray -> containerTopicMapArray[1],
containerTopicMapArray -> containerTopicMapArray[0]));

Map<String, String> topicMapFromConfig = this.config.getContainersConfig().getContainerToTopicMap();
Map<String, String> effectiveContainersTopicMap = new HashMap<>();
allContainers.forEach(containerProperties -> {
// by default, we are using container id as the topic name as well unless customer override through containers.topicMap
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -36,9 +36,9 @@ public static CosmosAsyncClient getCosmosClient(CosmosAccountConfig accountConfi

private static String getUserAgentSuffix(CosmosAccountConfig accountConfig) {
if (StringUtils.isNotEmpty(accountConfig.getApplicationName())) {
return CosmosConstants.USER_AGENT_SUFFIX + "|" + accountConfig.getApplicationName();
return KafkaCosmosConstants.USER_AGENT_SUFFIX + "|" + accountConfig.getApplicationName();
}

return CosmosConstants.USER_AGENT_SUFFIX;
return KafkaCosmosConstants.USER_AGENT_SUFFIX;
}
}

This file was deleted.

Loading