Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TPU Vote using quic -- client side implementation #3473

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

lijunwangs
Copy link

@lijunwangs lijunwangs commented Nov 5, 2024

Problem

TPU vote using QUIC.

Summary of Changes

Use Connection cache to encapsulate sending TPU vote via either QUIC or UDP.

Introduced a validator flag to control if to send TPU vote via QUIC -- the default is false -- meaning using UDP. The default should have no impacts on TPU vote sending

Tests:

Tests have been done to turn DEFAULT_VOTE_USE_QUIC to true -- meaning sending TPU votes via QUIC only.
This is done in test PR: #3571. Showing CI results are okay.

Further I disabled votes in gossips -- to ensure only sending votes via TPU vote using QUIC. The GCE cluster launched with this configuration is still functioning as expected. Metrics in connection_cache_vote_quic and quic_streamer_tpu_vote shows votes streams sent and received on the client and server side respectively.

Fixes #

core/src/validator.rs Outdated Show resolved Hide resolved
validator/src/cli.rs Outdated Show resolved Hide resolved
@lijunwangs lijunwangs changed the title Vote use quic.client TPU Vote using quic -- client side implementation Nov 11, 2024
Copy link

@bw-solana bw-solana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few minor things, but looks good overall!

core/src/validator.rs Outdated Show resolved Hide resolved
core/src/validator.rs Outdated Show resolved Hide resolved
validator/src/cli.rs Show resolved Hide resolved
gossip/src/cluster_info.rs Outdated Show resolved Hide resolved
core/src/validator.rs Outdated Show resolved Hide resolved
core/src/validator.rs Outdated Show resolved Hide resolved
core/src/validator.rs Outdated Show resolved Hide resolved
gossip/src/cluster_info.rs Outdated Show resolved Hide resolved
gossip/src/cluster_info.rs Outdated Show resolved Hide resolved
Vote using QUIC

send vote packet using the rigth sender

removed dup declared functions

rebase with master QuicServerParams

removed remove_tpu_vote

first part of sending votes using quic

use quic for vote on client side with connection cache

add debug messages

turn on quic for vote by default for testing
@lijunwangs
Copy link
Author

@behzadnouri @bw-solana Can you please take another look at this PR? Thanks!

Copy link

@bw-solana bw-solana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link

@behzadnouri behzadnouri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.

Probably better to do some stress tests that the UDP path has not become accidentally any slower before merging. thanks.

Comment on lines +58 to +64
match client.send_data_async(buf) {
Ok(_) => Ok(()),
Err(err) => {
trace!("Ran into an error when sending vote: {err:?} to {tpu:?}");
Err(GossipError::SendError)
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably better:

client.send_data_async(buf).map_err(|err| { /* ... */ })

Comment on lines +126 to +135
let _ = send_vote_transaction(
cluster_info,
vote_op.tx(),
Some(tpu_vote_socket),
&connection_cache,
);
}
} else {
// Send to our own tpu vote socket if we cannot find a leader to send to
let _ = cluster_info.send_transaction(vote_op.tx(), None);
let _ = send_vote_transaction(cluster_info, vote_op.tx(), None, &connection_cache);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we introduce a metric/counter for the errors here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will address it in another PR: #3894

@lijunwangs
Copy link
Author

lgtm.

Probably better to do some stress tests that the UDP path has not become accidentally any slower before merging. thanks.

Will do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants