Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dm: add TCP connection IO traffic statistics to sync stage status in OpenAPI response #11742

Merged
merged 51 commits into from
Nov 28, 2024

Conversation

River2000i
Copy link
Contributor

@River2000i River2000i commented Nov 13, 2024

What problem does this PR solve?

Issue Number: close #11741 #11746

What is changed and how it works?

  1. add filed in dm/proto/dmworker.proto and regenerate by make generate-protobuf
  2. update openapi define and regenerate by make dm_generate_openapi
  3. init IO counter and uuid for the task created by openapi

Check List

Tests

  • Unit test
  • Manual test (add detailed scripts or steps below)

Questions

Will it cause performance regression or break compatibility?
Do you need to update user documentation, design documentation or monitoring documentation?

Release note

Please refer to [Release Notes Language Style Guide](https://pingcap.github.io/tidb-dev-guide/contribute-to-tidb/release-notes-style-guide.html) to write a quality release note.

If you don't think this PR needs a release note then fill it with `None`.

@ti-chi-bot ti-chi-bot bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. area/dm Issues or PRs related to DM. area/engine Issues or PRs related to Dataflow Engine. contribution This PR is from a community contributor. first-time-contributor Indicates that the PR was contributed by an external member and is a first-time contributor. needs-ok-to-test Indicates a PR created by contributors and need ORG member send '/ok-to-test' to start testing. labels Nov 13, 2024
Copy link
Contributor

ti-chi-bot bot commented Nov 13, 2024

Hi @River2000i. Thanks for your PR.

I'm waiting for a pingcap member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ti-chi-bot ti-chi-bot bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Nov 13, 2024
@River2000i River2000i changed the title dm: add IO traffic statistics to sync stage status in OpenAPI response dm: add TCP connection IO traffic statistics to sync stage status in OpenAPI response Nov 13, 2024
@@ -182,13 +182,13 @@ type SubTaskConfig struct {
// one go runtime.
// IOTotalBytes is used build TCPConnWithIOCounter and UUID is used to as a
// key to let MySQL driver to find the right TCPConnWithIOCounter.
UUID string `toml:"-" json:"-"`
IOTotalBytes *atomic.Uint64 `toml:"-" json:"-"`
UUID string `toml:"uuid" json:"uuid"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember toml is used for passing task content, and json is used for logging. We don't need to log them?

Also please add unit tests to ensure atomic types has implemented these marshall functions without race

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember toml is used for passing task content, and json is used for logging. We don't need to log them?

Indeed, we don't need to add json tag for both uuid. json tag used for log and openapi response

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also please add unit tests to ensure atomic types has implemented these marshall functions without race

nice catch! The atomic.Uint64 values can't be properly serialized to TOML format. How about we manually copy atomic value like:b094cb4

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since atomic.Uint64 is not support marshall functions for toml like https://github.com/pingcap/tiflow/blob/master/dm/master/scheduler/worker.go#L282

we need to avoid (nil pointer)/(zero value) after covert subtaskconfig to toml... @lance6716 Any suggestion for that?🤔

@lance6716
Copy link
Contributor

/ok-to-test

@ti-chi-bot ti-chi-bot bot added ok-to-test Indicates a PR is ready to be tested. and removed needs-ok-to-test Indicates a PR created by contributors and need ORG member send '/ok-to-test' to start testing. labels Nov 14, 2024
@River2000i
Copy link
Contributor Author

/retest

@River2000i
Copy link
Contributor Author

/retest-required

@River2000i
Copy link
Contributor Author

@GMHDBJD @D3Hunter PTAL~


init_dump_data

# start dump task success
openapi_task_check "start_task_success" $task_name ""
run_dm_ctl_with_retry $WORK_DIR "127.0.0.1:$MASTER_PORT" \
"query-status $task_name" \
"\"stage\": \"Running\"" 1
Copy link
Contributor Author

@River2000i River2000i Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in some case, dump task will finished immediately so cannot get the Running stage. Only validate task finished or not in test_dump_task(). Need to validate dump data after we support load task.

@River2000i
Copy link
Contributor Author

/retest

Copy link
Contributor

@D3Hunter D3Hunter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest lgtm

Comment on lines 419 to 421
max_dump_io_total_bytes=sys.maxsize,
min_io_total_bytes=0,
max_io_total_bytes=sys.maxsize
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value sys.maxsize, on the other hand, reports the platform's pointer size, and that limits the size of Python's data structures such as strings and lists.

maybe use a more restricted range, such as a few KiB or MiB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set max to 100 KiB. In openapi test, all sync task is less than 30KiB.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about the minimum? i guess we will at least send some rows for DM to sync. we want to check that the bytes are in a reasonable range to meter correct

Copy link
Contributor Author

@River2000i River2000i Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For sure, for now the minimum is ~3KiB. I think we can set to 2KiB for stable test. And maximum set to 50KiB.

@River2000i River2000i requested a review from D3Hunter November 27, 2024 05:38
Copy link
Contributor

@D3Hunter D3Hunter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest lgtm

@ti-chi-bot ti-chi-bot bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved labels Nov 27, 2024
@D3Hunter
Copy link
Contributor

/hold

please fix existing comments, you can unhold it youself

@ti-chi-bot ti-chi-bot bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 27, 2024
@River2000i
Copy link
Contributor Author

/retest

2 similar comments
@River2000i
Copy link
Contributor Author

/retest

@River2000i
Copy link
Contributor Author

/retest

Copy link
Contributor

@GMHDBJD GMHDBJD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

ti-chi-bot bot commented Nov 28, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: D3Hunter, GMHDBJD

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Nov 28, 2024
Copy link
Contributor

ti-chi-bot bot commented Nov 28, 2024

[LGTM Timeline notifier]

Timeline:

  • 2024-11-27 06:23:24.984399899 +0000 UTC m=+617592.604054414: ☑️ agreed by D3Hunter.
  • 2024-11-28 02:40:38.504127318 +0000 UTC m=+690626.123781842: ☑️ agreed by GMHDBJD.

@River2000i
Copy link
Contributor Author

/unhold

@ti-chi-bot ti-chi-bot bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 28, 2024
@ti-chi-bot ti-chi-bot bot merged commit b909eff into pingcap:master Nov 28, 2024
26 checks passed
@River2000i River2000i deleted the featIoTotalBytes branch November 28, 2024 03:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved area/dm Issues or PRs related to DM. area/engine Issues or PRs related to Dataflow Engine. contribution This PR is from a community contributor. first-time-contributor Indicates that the PR was contributed by an external member and is a first-time contributor. lgtm ok-to-test Indicates a PR is ready to be tested. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

dm: add TCP connect IO traffic statistics at sync stage status in OpenAPI response
4 participants