Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PROD-39429 Implement migrate sys func from new channel(Format V2) to old channel (V1) - Push to Main #751

Merged
merged 11 commits into from
Nov 22, 2023

Conversation

sfc-gh-japatel
Copy link
Collaborator

@sfc-gh-japatel sfc-gh-japatel commented Nov 16, 2023

PR #750 but pushed in main.

Copying it as is.

This is a long term implementation for potential data duplication introduced because of a new channel name format.

  • It is about moving away for customers who might have onboarded to new channel format which was introduced in version 2.1.0 (Now de-listed)
  • We will stick to old format for new customers running this for the first time.
  • Renamed channelName to channelNameFormatV1 and introduced channelNameFormatV2

Added tests in TopicPartitionChannel and TopicPartitionChannel

Notes:

End to End tests:

  1. Use 2.1.0

  2. Use two connectors:

  3. Uses channel Name v2
    Screenshot 2023-11-20 at 4 54 59 PM

  4. Stop

  5. Replace jar (2.1.1) and restart.

  6. Does the migration and you can only see old channel format
    Screenshot 2023-11-20 at 5 17 59 PM

  7. Stop again and restart in 2.1.1
    Nothing happens and we do get a valid response saying newchannelFormatV2 doesnt exist
    Screenshot 2023-11-20 at 5 18 11 PM

@sfc-gh-japatel sfc-gh-japatel marked this pull request as ready for review November 16, 2023 23:39
@sfc-gh-japatel sfc-gh-japatel changed the title PROD-39429 migrate sys func push main PROD-39429 Implement migrate sys func from new channel(Format V2) to old channel (V1) - Push to Main Nov 16, 2023
}
if (migrateOffsetTokenResultFromSysFunc == null) {
LOGGER.warn(
"No result found in Migrating OffsetToken through System Function for tableName:{},"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does it mean by no result found? No destination channel or source channel found? or both?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is not expected, should we throw an exception?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not expected and I thought we decided to not throw any exceptions/swallow all and continue using old channel.

I can see the concern though but at this point I feel it is about whether we want to halt ingestion or atleast ignore the exception and continue moving forward with old. Halting ingestion could be better and it helps us know something is wrong rather than continuing with old channel with ramifications we dont.
WDYT Toby?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that its better to swallow this exception here, but we should have something to track an error here - maybe an alert on the server side if we don't return an offset on the func call, or a new telemetry event similar to reportKafkaFatalError (if we aren't expecting too many hits on this error)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, both..
I think we should throw an exception if this is not expected.. I can add a server side incident and also add report telemetry..

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For an unexpected exception, I am throwing a runtime exception. PTAL! It's better to fail here IMO. Thanks folks!

"Migrating OffsetToken for a SourceChannel:{} in table:{} failed due to:{}",
sourceChannelName,
fullyQualifiedTableName,
e.getMessage());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getMessage might be NULL for some exceptions, could we do better?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be worth logging stacktrace too. let me add that

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@@ -88,17 +88,17 @@ public void testSinkTaskInvalidRecord_InMemoryDLQ() throws Exception {
new TopicPartitionChannel(
mockStreamingClient,
topicPartition,
SnowflakeSinkServiceV2.partitionChannelKey(TEST_CONNECTOR_NAME, topicName, partition),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any tests that can actually test the end2end migration?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all end to end tests will eventually go through this path right? since it is default to true.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be hard to test this scenario in e2e since we wouldn't be able to upgrade the version. Could we add an IT (maybe in SnowflakeSinkServiceV2IT) that explicitly opens and ingests to a channel with V2 name, shuts it down, then creates a new sink and runs through the channel migration?

Not 100% sure this is possible, but maybe we can stick a mock stub for the client to hard code the channel name during open?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea to ingest through topicpartition channel through v2 and then let it go through migration.. I will add IT tests..

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I see the issue, the IT tests are showing up other PR: https://github.com/snowflakedb/snowflake-kafka-connector/pull/750/files
I might have messed up with merge conflicts.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ported the IT tests here, added few more tests in connectionIT tests.

Copy link
Collaborator

@sfc-gh-rcheng sfc-gh-rcheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall lgtm, but i would like an IT or e2e test that explictly opens and ingests to a V2 channel and confirms that is migrated later on

public static final boolean ENABLE_CHANNEL_OFFSET_TOKEN_MIGRATION_DEFAULT = true;
public static final String ENABLE_CHANNEL_OFFSET_TOKEN_MIGRATION_DOC =
"This config is used to enable/disable streaming channel offset migration logic. If true, we"
+ " will migrate offset token from channel name format V2 to name format v1.";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: lets add something about how V2 is deprecated. Otherwise customers may disable this because V2 sounds fancier than V1

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, let me add!

}
if (migrateOffsetTokenResultFromSysFunc == null) {
LOGGER.warn(
"No result found in Migrating OffsetToken through System Function for tableName:{},"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that its better to swallow this exception here, but we should have something to track an error here - maybe an alert on the server side if we don't return an offset on the func call, or a new telemetry event similar to reportKafkaFatalError (if we aren't expecting too many hits on this error)

@@ -278,6 +285,14 @@ public TopicPartitionChannel(

this.enableSchemaEvolution = this.enableSchematization && hasSchemaEvolutionPermission;

this.channelNameFormatV2 =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets move this channelNameFormatV2 into the if statement? I don't think V2 is needed after the channel is migrated

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was private final so had to put it outside in ctor. Let me think if this is needed as instance variable. (i had plans to modify it in tests but havent added any IT tests)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@@ -88,17 +88,17 @@ public void testSinkTaskInvalidRecord_InMemoryDLQ() throws Exception {
new TopicPartitionChannel(
mockStreamingClient,
topicPartition,
SnowflakeSinkServiceV2.partitionChannelKey(TEST_CONNECTOR_NAME, topicName, partition),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be hard to test this scenario in e2e since we wouldn't be able to upgrade the version. Could we add an IT (maybe in SnowflakeSinkServiceV2IT) that explicitly opens and ingests to a channel with V2 name, shuts it down, then creates a new sink and runs through the channel migration?

Not 100% sure this is possible, but maybe we can stick a mock stub for the client to hard code the channel name during open?

@sfc-gh-japatel
Copy link
Collaborator Author

overall lgtm, but i would like an IT or e2e test that explictly opens and ingests to a V2 channel and confirms that is migrated later on

Sorry, the merge conflicts didnt show IT tests.
I ported IT tests from other PR, added few more connectionserviceIT tests.
e2e test involving two jars is going to be a bit difficult with our current infra, but let me think more on that.

Copy link
Contributor

@sfc-gh-tzhang sfc-gh-tzhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments, PTAL, otherwise LGTM!

Comment on lines +1052 to +1058
LOGGER.info(
"Migrate OffsetToken response for table:{}, sourceChannel:{}, destinationChannel:{}"
+ " is:{}",
tableName,
sourceChannelName,
destinationChannelName,
channelMigrateOffsetTokenResponseDTO);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log might be confusing to customer, could we only log when there is actually a migration being done? Or do we even need this client side log given that we have a bunch of server side logging in place?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be fine, this is only once per channel during open partition, not very often. I would like to keep this which helps us in debugging any customer issue.

Copy link
Collaborator

@sfc-gh-rcheng sfc-gh-rcheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the change! The ITs are pretty cool - but small request to try ingestion after the migration as well


String migrateOffsetTokenResultFromSysFunc = null;
if (resultSet.next()) {
migrateOffsetTokenResultFromSysFunc = resultSet.getString(1 /*Only one column*/);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of scope for this PR - do we have any guarantees that this DTO won't change on the server side? a comment, test or something server side to warn against changes since we now throw the exception on failure

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, i think the json exception will help with that.. I think I have the comment on server side saying be cautious to make changes in the interface. but let me try to mimick and see what happens if response is changed. I could probably handle new fields in the object mapper.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved the method to its own method so that we can test it. If the response changes from server side which shouldnt, this tests methods will start failing

@sfc-gh-japatel sfc-gh-japatel force-pushed the japatel-PROD-39429-migrate-sys-func-push-main branch from 0b2b4a6 to b9b1754 Compare November 21, 2023 22:51
@sfc-gh-japatel sfc-gh-japatel force-pushed the japatel-PROD-39429-migrate-sys-func-push-main branch from b9b1754 to e54b031 Compare November 21, 2023 23:22
@sfc-gh-japatel sfc-gh-japatel merged commit 6edd211 into master Nov 22, 2023
59 of 60 checks passed
@sfc-gh-japatel sfc-gh-japatel deleted the japatel-PROD-39429-migrate-sys-func-push-main branch November 22, 2023 05:03
khsoneji pushed a commit to confluentinc/snowflake-kafka-connector that referenced this pull request Dec 4, 2023
khsoneji pushed a commit to confluentinc/snowflake-kafka-connector that referenced this pull request Dec 4, 2023
khsoneji pushed a commit to confluentinc/snowflake-kafka-connector that referenced this pull request Dec 4, 2023
EduardHantig pushed a commit to streamkap-com/snowflake-kafka-connector that referenced this pull request Feb 1, 2024
sudeshwasnik pushed a commit to confluentinc/snowflake-kafka-connector that referenced this pull request Feb 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants