Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A0-4318: network flooding test #1864

Open
wants to merge 15 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
15 commits
Select commit Hold shift + click to select a range
1311d8e
added `flooding` routines for both alephbft and sync networks.
fixxxedpoint Nov 5, 2024
fae0ab4
added FINALIZATION_WAIT param for the `finalization` e2e-test
fixxxedpoint Nov 5, 2024
2a3ef60
new docker-compose definition for the network `flooding` tests
fixxxedpoint Nov 5, 2024
6163040
simplified docker-compose file for the network `flooding` test
fixxxedpoint Nov 5, 2024
69329ac
building e2e-tests with `--locked` file - helps to keep its Cargo.loc…
fixxxedpoint Nov 5, 2024
9941d6e
added the `network-flooding` tests to nightly e2e-pipeline
fixxxedpoint Nov 5, 2024
a3058bc
Revert "simplified docker-compose file for the network `flooding` test"
fixxxedpoint Nov 5, 2024
12cc048
explit env vars TIMEOUT_MINUTES="5m" FINALIZATION_WAIT=1 <script> for…
fixxxedpoint Nov 5, 2024
6aa13bf
e2e-tests: finalization test was await `expected finalized block` + 1…
fixxxedpoint Nov 5, 2024
b04808a
halved network bit rates for the docker network flooding test (e2e ni…
fixxxedpoint Nov 5, 2024
675b051
halved (1Mib) rate-limit for the sync-network for the flooding test
fixxxedpoint Nov 5, 2024
fffdb25
halved rate-limit for the sync-network in flooding e2e-test - docker/…
fixxxedpoint Nov 6, 2024
77978f7
halved rate-limit for alephbft network for the flooding e2e test - no…
fixxxedpoint Nov 6, 2024
d934553
yamlint for .github/actions/run-e2e-test/action.yml .github/workflows…
fixxxedpoint Nov 27, 2024
f8a8312
rustfmt for clique/src/protocols/v1/mod.rs and finality-aleph/src/net…
fixxxedpoint Nov 27, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions .github/actions/run-e2e-test/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -93,8 +93,8 @@ runs:
- name: Wait for the finalization before e2e test
shell: bash
run: |
export TIMEOUT_MINUTES="5m"
./.github/scripts/run_e2e_test.sh -t finalization::finalization \
TIMEOUT_MINUTES="5m" FINALIZATION_WAIT=1 ./.github/scripts/run_e2e_test.sh \
timorleph marked this conversation as resolved.
Show resolved Hide resolved
-t finalization::finalization \
-a aleph-e2e-client:latest

- name: Run single e2e test
Expand Down Expand Up @@ -156,6 +156,6 @@ runs:
if: inputs.check-finalization-after-test == 'true'
shell: bash
run: |
export TIMEOUT_MINUTES="5m"
./.github/scripts/run_e2e_test.sh -t finalization::finalization \
TIMEOUT_MINUTES="5m" FINALIZATION_WAIT=1 ./.github/scripts/run_e2e_test.sh \
-t finalization::finalization \
-a aleph-e2e-client:latest
5 changes: 5 additions & 0 deletions .github/scripts/run_e2e_test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,11 @@ fi
if [[ -n "${OUT_LATENCY:-}" ]]; then
ARGS+=(-e OUT_LATENCY)
fi

if [[ -n "${FINALIZATION_WAIT:-}" ]]; then
ARGS+=(-e FINALIZATION_WAIT)
fi

timeout_duration="${TIMEOUT_MINUTES:-20m}"
echo "Running e2e test ${TEST_CASES}"
echo "Logs will be shown when tests finishes or after ${timeout_duration} timeout."
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/_build-aleph-e2e-client.yml
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ jobs:
run: |
cd e2e-tests/
rm -f target/release/deps/aleph_e2e_client*
cp $(cargo test --no-run --release --message-format=json | jq -r .executable | \
cp $(cargo test --no-run --release --locked --message-format=json | jq -r .executable | \
grep aleph_e2e_client) target/release/aleph-e2e-client

- name: Get binary artifact name
Expand Down
8 changes: 7 additions & 1 deletion .github/workflows/_build-aleph-node.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,12 @@ on:
description: 'Set to true to build production binary, otherwise set to false'
type: boolean
required: true
additional-flags:
description: 'string consisting of all additional flags for the `cargo build` command,
e.g. --features <some_feature_name>'
type: string
required: false
default: ''
outputs:
artifact-name-binary:
description: 'Name of artifact aleph-node binary'
Expand Down Expand Up @@ -49,7 +55,7 @@ jobs:
- name: Build test aleph-node
if: ${{ inputs.production != true }}
run: |
cargo build --release -p aleph-node --features only_legacy
cargo build --release -p aleph-node --features only_legacy ${{ inputs.additional-flags }}

- name: Build production aleph-node
if: ${{ inputs.production == true }}
Expand Down
54 changes: 53 additions & 1 deletion .github/workflows/nightly-normal-session-e2e-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,15 @@ jobs:
# yamllint disable-line rule:line-length
artifact-aleph-node-image: ${{ needs.build-production-aleph-node.outputs.artifact-name-image }}

build-aleph-node-with-network-flooding-test:
needs: [check-vars-and-secrets]
name: Build aleph-node with flooding routines for both sync-network and alephbft-network
uses: ./.github/workflows/_build-aleph-node.yml
with:
ref: ${{ github.ref }}
production: false
additional-flags: '--features network_flooding_test'

build-chain-bootstrapper-production:
name: Build chain-bootstrapper
uses: ./.github/workflows/_build-chain-bootstrapper.yml
Expand Down Expand Up @@ -316,6 +325,48 @@ jobs:
--chain-bootstrapper ../target/release/chain-bootstrapper \
--testcase test_major_sync

run-network-flooding-test:
needs:
- build-production-aleph-node
- build-aleph-node-with-network-flooding-test
- build-aleph-e2e-client-image
- build-chain-bootstrapper-production
name: Run network flooding test
runs-on: ubuntu-20.04
steps:
- name: Checkout source code
uses: actions/checkout@v4

- name: Download node-flooding docker image
uses: actions/download-artifact@v4
with:
name: ${{ needs.build-aleph-node-with-network-flooding-test.outputs.artifact-name-image }}

- name: Load node-flooding docker image
shell: bash
run: |
docker load -i aleph-node.tar
docker tag aleph-node:latest aleph-node:flooding
docker image rm aleph-node:latest
mv aleph-node.tar aleph-node_flooding.tar

- name: Run e2e test
uses: ./.github/actions/run-e2e-test
env:
# more than one session
FINALIZATION_WAIT: 901
with:
test-case: finalization::finalization
# yamllint disable-line rule:line-length
artifact-aleph-node-image: ${{ needs.build-production-aleph-node.outputs.artifact-name-image }}
aleph-node-image-tag: aleph-node:latest
# yamllint disable-line rule:line-length
artifact-chain-bootstrapper-image: ${{ needs.build-chain-bootstrapper-production.outputs.artifact-name-image }}
compose-file: docker/docker-compose_network_flooding_test.yml
# yamllint disable-line rule:line-length
artifact-aleph-e2e-client-image: ${{ needs.build-aleph-e2e-client-image.outputs.artifact-name-image }}
timeout-minutes: 40

run-force-reorg-test:
needs:
- build-production-aleph-node
Expand Down Expand Up @@ -381,7 +432,8 @@ jobs:
run-e2e-sync-test-into_two_groups_one_with_quorum,
run-force-reorg-test,
run-major-sync-test,
run-e2e-no-quorum-without-high-out-latency]
run-e2e-no-quorum-without-high-out-latency,
run-network-flooding-test]
name: Check nightly test suite completion
if: ${{ !cancelled() }}
runs-on: ubuntu-20.04
Expand Down
4 changes: 4 additions & 0 deletions bin/node/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -109,3 +109,7 @@ aleph-runtime-native = [
only_legacy = [
"finality-aleph/only_legacy"
]

network_flooding_test = [
"finality-aleph/network_flooding_test"
]
3 changes: 3 additions & 0 deletions clique/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -36,3 +36,6 @@ tokio = { workspace = true, features = [
[dev-dependencies]
aleph-bft-types = { workspace = true }
aleph-bft-mock = { workspace = true }

[features]
network_flooding_test = []
63 changes: 63 additions & 0 deletions clique/src/protocols/v1/flooding.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
use std::time::Duration;

use futures::{channel::mpsc, StreamExt};
use parity_scale_codec::{Decode, Encode};
use tokio::{io::AsyncWrite, time::timeout};

use crate::{io::send_data, protocols::ProtocolError, Data, PublicKey};

const HEARTBEAT_TIMEOUT: Duration = Duration::from_secs(5);
const MAX_MISSED_HEARTBEATS: u32 = 4;

#[derive(Debug, Clone, Encode, Decode)]
enum Message<D: Data> {
Data(D),
Heartbeat,
}

/// Version of the `sending` part of the `clique-network` which tries to flood our network with messages. It attempts to send a
/// given data in a looped manner, flooding a node with valid messages. Whenever new data is available, it tries to update the
/// message it uses for flooding.
pub async fn sending<PK: PublicKey, D: Data, S: AsyncWrite + Unpin + Send>(
mut sender: S,
mut data_from_user: mpsc::UnboundedReceiver<D>,
) -> Result<(), ProtocolError<PK>> {
use Message::*;

let mut last_message = None;
loop {
let to_send = match data_from_user
.try_next()
.ok()
.flatten()
.map(Data)
.or_else(|| last_message.take())
{
Some(data) => {
let cloned_data = data.clone();
last_message = Some(data);
cloned_data
}
None => {
match timeout(HEARTBEAT_TIMEOUT, data_from_user.next()).await {
Ok(maybe_data) => match maybe_data {
Some(data) => {
let data = Data(data);
last_message = Some(data.clone());
data
}
// We have been closed by the parent service, all good.
None => return Ok(()),
},
_ => Heartbeat,
}
}
};
sender = timeout(
MAX_MISSED_HEARTBEATS * HEARTBEAT_TIMEOUT,
send_data(sender, to_send),
)
.await
.map_err(|_| ProtocolError::SendTimeout)??;
}
}
24 changes: 19 additions & 5 deletions clique/src/protocols/v1/mod.rs
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
use futures::{
channel::{mpsc, oneshot},
StreamExt,
};
#[cfg(feature = "network_flooding_test")]
mod flooding;

use futures::channel::{mpsc, oneshot};
use log::{debug, info, trace};
use parity_scale_codec::{Decode, Encode};
use tokio::{
Expand All @@ -10,7 +10,7 @@ use tokio::{
};

use crate::{
io::{receive_data, send_data},
io::receive_data,
metrics::{Event, Metrics},
protocols::{
handshake::{v0_handshake_incoming, v0_handshake_outgoing},
Expand Down Expand Up @@ -41,11 +41,16 @@ async fn check_authorization<SK: SecretKey>(
.map_err(|_| ProtocolError::NoParentConnection)
}

#[cfg(not(feature = "network_flooding_test"))]
async fn sending<PK: PublicKey, D: Data, S: AsyncWrite + Unpin + Send>(
mut sender: S,
mut data_from_user: mpsc::UnboundedReceiver<D>,
) -> Result<(), ProtocolError<PK>> {
use futures::StreamExt;
use Message::*;

use crate::io::send_data;

loop {
let to_send = match timeout(HEARTBEAT_TIMEOUT, data_from_user.next()).await {
Ok(maybe_data) => match maybe_data {
Expand All @@ -64,6 +69,15 @@ async fn sending<PK: PublicKey, D: Data, S: AsyncWrite + Unpin + Send>(
}
}

#[cfg(feature = "network_flooding_test")]
async fn sending<PK: PublicKey, D: Data, S: AsyncWrite + Unpin + Send>(
sender: S,
data_from_user: mpsc::UnboundedReceiver<D>,
) -> Result<(), ProtocolError<PK>> {
info!(target: "network-clique-flooder", "Starting the flooder for the Aleph-bft network.");
flooding::sending(sender, data_from_user).await
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, I kinda hate that our code now contains (optional) malicious parts within it. Nothing to do about it, just venting, resolve this comment.

}

async fn receiving<PK: PublicKey, D: Data, S: AsyncRead + Unpin + Send>(
mut stream: S,
data_for_user: mpsc::UnboundedSender<D>,
Expand Down
71 changes: 71 additions & 0 deletions docker/docker-compose_network_flooding_test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
services:
Node0:
extends:
file: docker-compose.base.yml
service: Node0
environment:
- PUBLIC_VALIDATOR_ADDRESS=Node0:30343
- ALEPHBFT_NETWORK_BIT_RATE=393216
- SUBSTRATE_NETWORK_BIT_RATE=786432

Node1:
extends:
file: docker-compose.base.yml
service: Node1
environment:
- PUBLIC_VALIDATOR_ADDRESS=Node1:30344
- BOOT_NODES=/dns4/Node0/tcp/30333/p2p/$BOOTNODE_PEER_ID
- ALEPHBFT_NETWORK_BIT_RATE=393216
- SUBSTRATE_NETWORK_BIT_RATE=786432

Node2:
extends:
file: docker-compose.base.yml
service: Node2
environment:
- PUBLIC_VALIDATOR_ADDRESS=Node2:30345
- BOOT_NODES=/dns4/Node0/tcp/30333/p2p/$BOOTNODE_PEER_ID
- ALEPHBFT_NETWORK_BIT_RATE=393216
- SUBSTRATE_NETWORK_BIT_RATE=786432

# Node3 should run the flooding routine.
Node3:
extends:
file: docker-compose.base.yml
service: Node3
image: aleph-node:flooding
environment:
- PUBLIC_VALIDATOR_ADDRESS=Node3:30346
- BOOT_NODES=/dns4/Node0/tcp/30333/p2p/$BOOTNODE_PEER_ID
- ALEPHBFT_NETWORK_BIT_RATE=393216
- SUBSTRATE_NETWORK_BIT_RATE=786432

Node4:
extends:
file: docker-compose.base.yml
service: Node4
environment:
- PUBLIC_VALIDATOR_ADDRESS=Node4:30347
- BOOT_NODES=/dns4/Node0/tcp/30333/p2p/$BOOTNODE_PEER_ID
- ALEPHBFT_NETWORK_BIT_RATE=393216
- SUBSTRATE_NETWORK_BIT_RATE=786432

Node5:
extends:
file: docker-compose.base.yml
service: Node5
environment:
- PUBLIC_VALIDATOR_ADDRESS=Node5:30348
- BOOT_NODES=/dns4/Node0/tcp/30333/p2p/$BOOTNODE_PEER_ID
- ALEPHBFT_NETWORK_BIT_RATE=393216
- SUBSTRATE_NETWORK_BIT_RATE=786432

Node6:
extends:
file: docker-compose.base.yml
service: Node6
environment:
- PUBLIC_VALIDATOR_ADDRESS=Node6:30349
- BOOT_NODES=/dns4/Node0/tcp/30333/p2p/$BOOTNODE_PEER_ID
- ALEPHBFT_NETWORK_BIT_RATE=393216
- SUBSTRATE_NETWORK_BIT_RATE=786432
Loading
Loading