-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SNOW-1859651 Change stage name prefix for tables matched with regex #1030
Open
sfc-gh-mbobowski
wants to merge
1
commit into
master
Choose a base branch
from
mbobowski-SNOW-1859651
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out of curiosity - do you think this expression could be extended somehow? the initial expression i've came up in the ticket was kinda quick thinking, I'm wonder if there could be other/better approach to tackle that?
One idea - in case we'd figue out we need to extend this - do you think it would make sense to embed this regexp in configuration, provide reasonable default and have possiblity to change it at runtime for some extreme corner cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively I was thinking about simply detecting any of the special characters like
[
or]
. But I am not sure if it's better.If we want to be more defensive with this change I would create a feature flag instead of exposing the regex. The worst case for messing sth up is that cleaner will stop working - it's not good for sure but also not terrible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Imo having a Regex for finding if a String is a multi-matching Regex is not what we want to do. (not only because it is hard and error prone). I think that we should calculate resulting table for the whole set of topics on each rebalance to create a literal topic to table map for the actual set of topics — only then we know if a topic defined in
topic2table.map
should be a literal topic or a topic-regex, e.g. "topic.name" could be both, literal matching "topic.name" or a regex for "topic[a-z|A-Z|0-9|._-]name".Having a concrete mapping for the actual topics allows us to give the definite answer whether we have a
MANY_TOPICS_SINGLE_TABLE
case.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do like this idea a lot to undwind all the topics up front to their final form rather than rely on regexes. There are two (at least) corner cases I'm worried about:
Potentially, we could consider one more option - Michal, correct me if i'm wrong on this one - the connector name is salted with timestamp on every deployment, right? if that's the case - i think we could simply enable the salting logic for file prefixes regardless of the topic2table configuration - the fact that connector's name is changing will also impact the stage prefix, so trying to keep the naming scheme backwards compatible is already invalidated - that's something i've noticed when analysing this cleaner issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a problem since we first unwind the topics, then check if we have duplicates.
This is a real problem if we cannot assume that each connector name is unique. Alternatively we could generate a UUID (or something) for each connector start that would not change on rebalances. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you checked the code and that's the logic - than ok. i thought it was running the init in a sequential loop per task.
the second part - this could be affecting the assignor policy and honestly - i doubt we should be doing that.
the more i think about it - the more i'm leaning toward enabling that topic salting regardless of the topic2table detected mode....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, if the table name was resolved using topic2table map we should salt the file prefix with the topic — otherwise there could be a clash anyway (see example below). This should not break anything but some files could stay as leftovers on the Stage after an upgrade and potentially that data could be duplicated. No data loss though.
Example:
topics=topicA,topicB
ortopics.regex=^topic[A-Z]$
and
topic2table.map=topicB:topicA
In the above situation, despite having only a single, non-regex entry in
topic2table.map
we would have a clash on the cleaners. Reversing a regex to check where the clash could happen is not feasible.