Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Kafka CDC Ingestion supports the bson string format of collecting mongodb to kafka by debezium #4615

Open
2 tasks done
lizc9 opened this issue Dec 2, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@lizc9
Copy link

lizc9 commented Dec 2, 2024

Search before asking

  • I searched in the issues and found nothing similar.

Motivation

cdc action supports the following formats of kafka data which collected from mongodb via debezium, regardless of whether it contains schema

{
  "before": null,
  "after": "{\"_id\":{\"$oid\":\"64001c996f4de7ff3189d374\"},\"last_updated_at\":{\"$numberLong\":\"1732232838425\"},\"tags\":[\"completely\",\"pass\"],\"updated_by\":\"xxx\"}",
  "updateDescription": null,
  "source": {
    "version": "2.7.0.Final",
    "connector": "mongodb",
    "name": "datapipeline",
    "ts_ms": 1732644484000,
    "snapshot": "false",
    "db": "datapipeline",
    "sequence": null,
    "ts_us": 1732644484000000,
    "ts_ns": 1732644484000000000,
    "collection": "clips",
    "ord": 22,
    "lsid": null,
    "txnNumber": null,
    "wallTime": null
  },
  "op": "c",
  "ts_ms": 1732644484231,
  "transaction": null
}

Solution

Add debezium-bson format for kafka cdc action:

  1. support parse bson string from before/after field in kafka message
  2. extract java object and basic data type from bson value
  3. convert java basic data type to string, java object to json string
  4. supports schema evolution, and all fields are fixed to string type, _id is fixed to primary key

Expected Results:

  • Schema:
Column DataType Key
_id STRING Primary Key
last_updated_at STRING
tags STRING
updated_by STRING
  • Records:
RowKind _id last_updated_at tags updated_by
+I 64001c996f4de7ff3189d374 1732232838425 ["completely","pass"] xxx

Anything else?

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@lizc9 lizc9 added the enhancement New feature or request label Dec 2, 2024
@lizc9 lizc9 changed the title [Feature] Kafka CDC action supports the bson string format of collecting mongodb to kafka by debezium [Feature] Kafka CDC Ingestion supports the bson string format of collecting mongodb to kafka by debezium Dec 4, 2024
@lizc9
Copy link
Author

lizc9 commented Dec 10, 2024

@JingsongLi Could you help me take a look?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant