SNOW-1694636: Schematization utils refactor (PART 3) - Refactor schema resolver #944

sfc-gh-wtrefon · 2024-10-03T14:20:01Z

Overview

SNOW-1694636

Pre-review checklist

sfc-gh-wtrefon · 2024-10-03T14:21:43Z

...va/com/snowflake/kafka/connector/internal/streaming/schemaevolution/TableSchemaResolver.java

+  private TableSchema getTableSchemaFromRecordSchema(
+      JsonNode recordNode, Set<String> columnNamesSet, SinkRecord record) {
+    Map<String, ColumnInfos> schemaMap = getFullSchemaMapFromRecord(record);
+    Map<String, ColumnInfos> columnsInferredFromSchema =
+        Streams.stream(recordNode.fields())
+            .map(ColumnValuePair::of)
+            .filter(pair -> columnNamesSet.contains(pair.getQuotedColumnName()))
+            .peek(
+                field -> {
+                  if (!schemaMap.containsKey(field.getColumnName())) {
+                    // only when the type of the value is unrecognizable for JAVA
+                    throw SnowflakeErrors.ERROR_5022.getException(
+                        "column: " + field.getColumnName() + " schemaMap: " + schemaMap);
+                  }
+                })
+            .map(
+                field ->
+                    Maps.immutableEntry(
+                        Utils.quoteNameIfNeeded(field.getQuotedColumnName()),
+                        schemaMap.get(field.getColumnName())))
+            .collect(
+                Collectors.toMap(
+                    Map.Entry::getKey, Map.Entry::getValue, (oldValue, newValue) -> newValue));
+    return new TableSchema(columnsInferredFromSchema);
+  }
+
+  private TableSchema getTableSchemaFromJson(JsonNode recordNode, Set<String> columnNamesSet) {
+    Map<String, ColumnInfos> columnsInferredFromJson =
+        Streams.stream(recordNode.fields())
+            .map(ColumnValuePair::of)
+            .filter(pair -> columnNamesSet.contains(pair.getQuotedColumnName()))
+            .map(
+                pair ->
+                    Maps.immutableEntry(
+                        pair.getQuotedColumnName(),
+                        new ColumnInfos(inferDataTypeFromJsonObject(pair.getJsonNode()))))
+            .collect(
+                Collectors.toMap(
+                    Map.Entry::getKey, Map.Entry::getValue, (oldValue, newValue) -> newValue));


open to suggestions how to make the refactoring nicer, java 8 does not help, kotlin would be perfect here ;(

in theory we could split the schemaFromJson and schemaFromRecordSchema into two separate classes but i wouldn't invest here much more as this class is likely to be fully changed for the complex iceberg types

sfc-gh-akowalczyk · 2024-10-04T07:27:27Z

src/main/java/com/snowflake/kafka/connector/internal/streaming/schemaevolution/ColumnInfos.java

@@ -12,6 +12,11 @@ public ColumnInfos(String columnType, String comments) {
    this.comments = comments;
  }

+  public ColumnInfos(String columnType) {


Why plural form here?

you are the author of this class :D

and this is constructor so it has to be class name

so that's my mistake, sorry about that 😄

sfc-gh-akowalczyk · 2024-10-04T07:52:29Z

...va/com/snowflake/kafka/connector/internal/streaming/schemaevolution/TableSchemaResolver.java

-    if (columnNames == null) {
+  public TableSchema resolveTableSchemaFromRecord(
+      SinkRecord record, List<String> columnsToInclude) {
+    if (columnsToInclude == null || columnsToInclude.isEmpty()) {
      return new TableSchema(new HashMap<>());


Suggested change

return new TableSchema(new HashMap<>());

return new TableSchema(ImmutableMap.of());

Why do we not use an immutable map here?

sfc-gh-akowalczyk · 2024-10-04T07:59:42Z

...va/com/snowflake/kafka/connector/internal/streaming/schemaevolution/TableSchemaResolver.java

+    private final String quotedColumnName;
+    private final JsonNode jsonNode;
+
+    public static ColumnValuePair of(Map.Entry<String, JsonNode> field) {


of suggests we construct the object with a given value. However, here, we apply some conversion - extracting fields, so from would be more suitable.

sfc-gh-akowalczyk

Only a few comments related to the naming convention.
It looks much better when the given refactoring is applied.

sfc-gh-achyzy

Looks good although think about replacing peek operation on stream

sfc-gh-achyzy · 2024-10-04T08:57:47Z

...va/com/snowflake/kafka/connector/internal/streaming/schemaevolution/TableSchemaResolver.java

+        Streams.stream(recordNode.fields())
+            .map(ColumnValuePair::of)
+            .filter(pair -> columnNamesSet.contains(pair.getQuotedColumnName()))
+            .peek(


I'm not a fan of peek as the docs explicitly stated: This method exists mainly to support debugging, where you want to see the elements as they flow past a certain point in a pipeline:
Maybe it would be better to partition/group it into separate streams?

partitionBy looks ok

sfc-gh-wtrefon requested a review from a team as a code owner October 3, 2024 14:20

sfc-gh-wtrefon commented Oct 3, 2024

View reviewed changes

sfc-gh-wtrefon force-pushed the wtrefon/SNOW-1694636-refactor-schema-resolver branch from 869aa4e to 0b3b5a4 Compare October 3, 2024 14:22

Refactor schema resolver

5cd5762

sfc-gh-wtrefon force-pushed the wtrefon/SNOW-1694636-refactor-schema-resolver branch from 0b3b5a4 to 5cd5762 Compare October 3, 2024 14:31

sfc-gh-akowalczyk reviewed Oct 4, 2024

View reviewed changes

sfc-gh-akowalczyk approved these changes Oct 4, 2024

View reviewed changes

sfc-gh-achyzy approved these changes Oct 4, 2024

View reviewed changes

CR changes

948b8b9

sfc-gh-wtrefon force-pushed the wtrefon/SNOW-1694636-refactor-schema-resolver branch from 5e72906 to 948b8b9 Compare October 4, 2024 11:05

sfc-gh-wtrefon enabled auto-merge (squash) October 4, 2024 11:30

sfc-gh-wtrefon merged commit 3bf31ff into master Oct 4, 2024
78 of 80 checks passed

sfc-gh-wtrefon deleted the wtrefon/SNOW-1694636-refactor-schema-resolver branch October 4, 2024 12:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SNOW-1694636: Schematization utils refactor (PART 3) - Refactor schema resolver #944

SNOW-1694636: Schematization utils refactor (PART 3) - Refactor schema resolver #944

sfc-gh-wtrefon commented Oct 3, 2024

sfc-gh-wtrefon Oct 3, 2024

sfc-gh-wtrefon Oct 3, 2024

sfc-gh-akowalczyk Oct 4, 2024

sfc-gh-wtrefon Oct 4, 2024

sfc-gh-wtrefon Oct 4, 2024

sfc-gh-akowalczyk Oct 4, 2024

sfc-gh-akowalczyk Oct 4, 2024 •

edited

Loading

sfc-gh-akowalczyk Oct 4, 2024

sfc-gh-akowalczyk left a comment

sfc-gh-achyzy left a comment

sfc-gh-achyzy Oct 4, 2024

sfc-gh-wtrefon Oct 4, 2024

	return new TableSchema(new HashMap<>());
	return new TableSchema(ImmutableMap.of());

SNOW-1694636: Schematization utils refactor (PART 3) - Refactor schema resolver #944

SNOW-1694636: Schematization utils refactor (PART 3) - Refactor schema resolver #944

Conversation

sfc-gh-wtrefon commented Oct 3, 2024

Overview

Pre-review checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sfc-gh-akowalczyk Oct 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sfc-gh-akowalczyk left a comment

Choose a reason for hiding this comment

sfc-gh-achyzy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sfc-gh-akowalczyk Oct 4, 2024 •

edited

Loading