-
Notifications
You must be signed in to change notification settings - Fork 736
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rewrite some flaky zombienet polkadot tests to zombienet-sdk #6757
base: master
Are you sure you want to change the base?
Conversation
Hi @alindima, I think this is failing because we need to hide the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Thanks Alin!
assert_para_throughput( | ||
&relay_client, | ||
15, | ||
[(2000, 40..46), (2001, 12..16)].into_iter().collect(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These ranges don't make sense to me. 40 > 3 *12 .. are we too strict for the elastic scaling case or too lenient for the non-elastic scaling case?
Also note: I absolutely had to lookup the code of assert_para_throughput
to understand what this is doing. Not a big deal, but e.g. using types like ParaId would have made it easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These ranges don't make sense to me. 40 > 3 *12 .. are we too strict for the elastic scaling case or too lenient for the non-elastic scaling case?
So these ranges mean that:
we are waiting until we reached 15 finalized blocks (after the first session change, and don't count blocks with session changes).
Then, check that the number of backed blocks for para 2000 is within the 40..46 range (ideally it should be 45, since it's 3*15, but in reality the performance in the CI is not as good).
And check that the number of backed blocks for para 2001 is within the 12..16 range (this para only has one assigned core, so it can get at most 15 blocks in, again allow some buffer here for less performant hardware)
Also note: I absolutely had to lookup the code of assert_para_throughput to understand what this is doing. Not a big deal, but e.g. using types like ParaId would have made it easier.
I can do that 👍🏻
.wait_for_finalized_success() | ||
.await?; | ||
|
||
log::info!("1 more core assigned to the parachain"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the previous test we had the assurance that adding additional cores worked because of the throughput assertion. Here we don't have this. I would add a check that the para has indeed two cores (and is still working with the throughput of 1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, makes sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
// Assert the parachain finalized block height is also on par with the number of backed | ||
// candidates. | ||
assert_finalized_block_height(¶_node.wait_client().await?, 6..9).await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I get that we don't want flaky tests. It is kind of concerning that we let tests pass if we do less blocks than expected. We definitely need some tests in some fashion ensuring that we are not degrading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I get that we don't want flaky tests. It is kind of concerning that we let tests pass if we do less blocks than expected.
I mean, we've always done this in the CI so far. The only way of fixing this IMO is to invest more in the reliability and performance of the CI machines
All GitHub workflows were cancelled due to failure one of the required jobs. |
…into alindima/zombienet-sdk-rewrite
Will fix:
#6574
#6644
#6062