-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Orchestrator thread keepalive #524
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments and questions.
gas_adjustment: f64, | ||
msg_batch_size: usize, | ||
) { | ||
while let Some(messages) = rx.recv().await { | ||
while let Some(messages) = rx.lock().await.recv().await { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we creating a lock around the tokio receiver? The async channels should be able to be filled and drained without having to have synchronous locking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It comes down to satisfying the borrow checker while passing in the receiver to run_sender
inside of the loop inside of send_main_loop
.
To get the compiler to let me pass the receiver into each loop iteration (each cosmos sender thread restart) I have to wrap the receiver in an atomic reference counter so that I can clone without making a duplicate of the receiver. However, Arc<T>
doesn't have the DerefMut
trait, and the recv()
method requires that the value be a mutable reference. To get a mutable reference of the contained object in the Arc, I have to guarantee that one and only one mutable reference exists at a time, which is where the lock comes in.
@@ -105,6 +110,8 @@ pub async fn orchestrator_main_loop( | |||
} else { | |||
futures::future::join4(a, b, c, d).await; | |||
} | |||
|
|||
Ok(()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that this is now returning a Result
, should any of the other calls within this function use the ?
operator as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except for metrics, which still has a naked unwrap on the metrics server result, the design of each function is such that it should only ever stop if it panics, and I believe I've addressed the library-level panic scenarios.
@@ -408,14 +456,3 @@ pub async fn eth_signer_main_loop( | |||
); | |||
} | |||
} | |||
|
|||
pub async fn check_for_eth(orchestrator_address: EthAddress, eth_client: EthClient) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was an existing issue around the orchestrator not functioning correctly if a tiny amount of eth had not been deposited into the delegate key wallet. Did that stem from this error, or is it a problem regardless and that's why this check existed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I removed this because it's a duplicate. There's another copy, the one that's actually used, in gravity_utils/connection_prep.rs
The only place not in gorc that it's used is when running the relayer
|
||
let a = send_main_loop( | ||
&contact, | ||
contact.clone(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the benefit of cloning contact
rather than using the same instance as it was doing before? Is there a concern around multiple threads using it and account sequencing issues? I do wonder if cloning the object isn't just going to create multiple versions of e.g. previous account sequence state, rather than recreating a new Contact
when it's needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The contact object doesn't maintain account state, but actually queries the required tx args every time it builds a transaction. See send_messages
, specifically the call to .get_message_args(...)
is where this happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, in regards to the real purpose of your question: it's another borrow checker/lifetime thing.
@@ -1,41 +0,0 @@ | |||
[package] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe there's also a go.mod
for the test runner that we can drop as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The testnet directory in the root of the repo? Should I remove the whole thing?
@@ -304,7 +355,7 @@ pub async fn eth_signer_main_loop( | |||
} | |||
|
|||
// sign the last unsigned valsets | |||
match get_oldest_unsigned_valsets(&mut grpc_client, our_cosmos_address).await { | |||
match get_oldest_unsigned_valsets(&mut grpc_client, cosmos_address.clone()).await { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the cosmos address should probably be immutable in most places we're using it, do we need to be cloning this object or can we adjust to using a reference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh, turns out Address
implements copy, so neither .clone()
nor an explicit borrow are needed
No description provided.