Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inefficient Builds (15 mins) #461

Open
krisnova opened this issue Mar 19, 2023 · 5 comments
Open

Inefficient Builds (15 mins) #461

krisnova opened this issue Mar 19, 2023 · 5 comments
Labels
Good First Issue New to the project? Looking for something to get started with?

Comments

@krisnova
Copy link
Contributor

I think there is opportunity for newcomers to the project to help with our current build times.

Right now our current builds can take up to 15 minutes, and I believe there is a lot of repeated and duplicated work. The more compiling and downloading and linking commands we do, the longer the builds.

How do we get build times down to just a few seconds? Maybe less than 60 seconds for the entire project?

image

@krisnova krisnova added the Good First Issue New to the project? Looking for something to get started with? label Mar 19, 2023
@krisnova krisnova changed the title Inefficient Builds Inefficient Builds (15 mins) Mar 19, 2023
@krisnova
Copy link
Contributor Author

Was able to get a 30 minute build.. ouch

@dmah42
Copy link
Contributor

dmah42 commented Mar 20, 2023

naive attempt:

$ rustup default nightly
$ export RUSTFLAGS=-Zself-profile
$ make all

unfortunately this build fails due to tonic generating code that fails the nightly clippy checks. adding an #![allow(clippy::all)] to the "./proto/src/lib.rs" file helps get past that.

aer falls foul of some other clippy checks that would need judicious allow statements. in the mean time i'll see if i can get results just for auraed.

$ cd auraed &&  cargo rustc --bin auraed -- -Zself-profile

works but only for the final crate. let's go deeper:

$ RUSTFLAGS="-Zself-profile=$(pwd)/profiles -Zself-profile-events=default,args" cargo build --bin auraed
$ crox --minimum-duration 10 --dir $(pwd)/profiles

oh look. a 2.4 GB JSON file for the chrome profiler to play with. which it rightly refused to load.

increasing the minimum duration to 3000 (3ms) helps. now it's only 23MB.

summary: auraed takes about 5.5mins to build clean with all but 47secs of that being dependencies. worst offenders are tonic, netlink_proto, axum, and protobuf. ie, nothing surprising.

so the slowness is likely that we have:

  • ./aer/Cargo.toml
  • ./aer/macros/Cargo.toml
  • ./auraed/Cargo.toml
  • ./auraescript/Cargo.toml
  • ./auraescript/macros/Cargo.toml
  • ./client/Cargo.toml
  • ./client/macros/Cargo.toml
  • ./crates/proto-reader/Cargo.toml
  • ./crates/test-helpers/Cargo.toml
  • ./crates/test-helpers-macros/Cargo.toml
  • ./crates/validation/Cargo.toml
  • ./crates/validation/macros/Cargo.toml
  • ./ebpf/Cargo.toml
  • ./ebpf-shared/Cargo.toml
  • ./proto/Cargo.toml

each of which define their own set of dependencies that need checking, maybe downloading, and maybe building. if each takes 2mins, we're already over 30mins.

naively i'd expect that if we merge these into a single crate we'd end up spending less time dealing with dependencies but having less flexibility to build things individually.

@future-highway
Copy link
Contributor

future-highway commented Mar 21, 2023

naively i'd expect that if we merge these into a single crate we'd end up spending less time dealing with dependencies but having less flexibility to build things individually.

The guidance for Rust is usually to break projects up into small crates since crates that don't depend on each other can be built in parallel.

By looking at an action that missed the cache and picking a single dependency (I'm picking serde under the assumption that it is a common dependency; it appears 44x in our Cargo.lock), we can get some hints on how Cargo reuses its output.

The following log lines appear for serde:

cargo build --target x86_64-unknown-linux-musl -p auraed
Downloaded serde v1.0.152
Compiling serde v1.0.152
...
cargo build  -p auraescript
Compiling serde v1.0.152
...
cargo build  -p aer
Compiling serde v1.0.152
...
cargo clippy --all-features --workspace --exclude auraed --exclude auraescript --exclude aer  -- -D clippy::all -D warnings
Checking serde v1.0.152
...
cargo clippy --target x86_64-unknown-linux-musl -p auraed --all-features -- -D clippy::all -D warnings
Checking serde v1.0.152
...
cargo clippy  -p auraescript --all-features -- -D clippy::all -D warnings
Checking serde v1.0.152
...
cargo clippy  -p aer --all-features -- -D clippy::all -D warnings
# serde does not get checked even though `aer` has a dependency on it
...
cargo install --target x86_64-unknown-linux-musl --path ./auraed --debug --force
Downloaded serde v1.0.158
Compiling serde v1.0.158
...
cargo install  --path ./auraescript --debug --force
Compiling serde v1.0.158

Notes:

  • cargo build does a single download regardless of target
  • cargo install won't use the download from cargo build so it downloads again
  • cargo build -p WHATEVER will recompile the dependency
  • cargo install -p WHATEVER will recompile the dependency
  • cargo clippy sometimes rechecks dependencies. My only guess is that it only caches the last target used since it recheck as we changed from default target to explicit x86 and then back to default, but didn't recheck when aer ran which would have been a second default target check in a row (auraescript check before it)

It looks like we can eliminate two cargo clippy calls by altering the other:

  • delete: cargo clippy -p auraescript --all-features -- -D clippy::all -D warnings (1m 02s)
  • delete: cargo clippy -p aer --all-features -- -D clippy::all -D warnings (30.56s)
  • update: cargo clippy --all-features --workspace --exclude auraed --exclude auraescript --exclude aer -- -D clippy::all -D warnings to cargo clippy --all-features --workspace --exclude auraed -- -D clippy::all -D warnings (should be less than an additional 1m 30s since there will be shared dependencies)

It is also possible that we can add cargo build --workspace --exclude auraed and eliminate the auraescript/aer individual builds.

@BradLugo
Copy link
Contributor

TL;DR: We should discuss why, what, and when we run certain CI commands. More to the point, here are some potential places we can improve the build times:

  • Revisit and think about how we are testing
  • Better understanding and implementation of dependency caching
  • Using a faster linker

A script for replicating the analysis from this post can be found at the bottom.


Analysis

From my initial investigation, it seems that the most timing-consuming operations are dependency management (particularly on fresh builds, as you might expect), the tests, and linking.

Preliminary details

During this investigation, it became apparent that I didn't understand cargo's caching mechanism and future-highway's comment confirmed it. I couldn't figure out how to handle the cargo build cache across the aurae workspace, so I'll be using sccache.

This analysis was performed on a qemu Ubuntu VM with 4 vCPUs from a i9-9980HK and 16GBs of memory allocated to the VM.

Results and notes

Cargo test timings

Results from cargo test --workspace --timings after a cargo clean (showing 30s+ units):
image

Results from cargo test --workspace --timings with cached dependencies (showing 30s+ units):
image

Results from RUSTFLAGS="-Clinker=clang -Clink-arg=-fuse-ld=lld -Cincremental=false" cargo test --workspace --timings with cached dependencies (more on linkers in a moment, showing 10s+ units):
image

Linker timings

Linking time with the default linker:

❯ rg '(link|total)' aer-default.txt
173:time:   0.158; rss:  173MB ->  173MB (   +0MB)      link_binary_check_files_are_writeable
174:time:  58.611; rss:  173MB ->  173MB (   +0MB)      run_linker
175:time:   0.162; rss:  173MB ->  173MB (   +0MB)      link_binary_remove_temps
176:time:  58.932; rss:  173MB ->  173MB (   +0MB)      link_binary
177:time:  58.937; rss:  173MB ->  114MB (  -59MB)      link_crate
178:time:  59.534; rss:  184MB ->  114MB (  -70MB)      link
179:time:  63.064; rss:   29MB ->   97MB (  +68MB)      total
❯ rg '(link|total)' auraed-default.txt
103:   Compiling netlink-packet-utils v0.5.1
112:   Compiling netlink-packet-core v0.4.2
130:   Compiling netlink-sys v0.8.4
144:   Compiling netlink-proto v0.10.0
151:   Compiling netlink-packet-route v0.13.0
166:   Compiling rtnetlink v0.11.0
220:time:   0.224; rss:  298MB ->  298MB (   +0MB)      link_binary_check_files_are_writeable
221:time: 152.282; rss:  298MB ->  295MB (   -4MB)      run_linker
222:time:   0.298; rss:  136MB ->  136MB (   +0MB)      link_binary_remove_temps
223:time: 152.823; rss:  298MB ->  136MB ( -162MB)      link_binary
224:time: 152.823; rss:  298MB ->  136MB ( -162MB)      link_crate
225:time: 153.706; rss:  338MB ->  136MB ( -202MB)      link
226:time: 165.519; rss:   29MB ->  104MB (  +76MB)      total
❯ rg '(link|total)' auraescript-default.txt
201:time:   0.058; rss:  155MB ->  155MB (   +0MB)      link_binary_check_files_are_writeable
202:time: 170.649; rss:  155MB ->  155MB (   +1MB)      run_linker
203:time:   0.071; rss:  146MB ->  146MB (   +0MB)      link_binary_remove_temps
204:time: 170.781; rss:  155MB ->  146MB (   -9MB)      link_binary
205:time: 170.781; rss:  155MB ->  146MB (   -9MB)      link_crate
206:time: 171.001; rss:  155MB ->  146MB (   -9MB)      link
207:time: 173.885; rss:   29MB ->   96MB (  +68MB)      total

Linking time with lld:

❯ rg '(link|total)' aer-lld.txt
171:time:   0.169; rss:  143MB ->  143MB (   +0MB)      link_binary_check_files_are_writeable
172:time:   9.499; rss:  143MB ->  143MB (   +0MB)      run_linker
173:time:   0.173; rss:  143MB ->  143MB (   +0MB)      link_binary_remove_temps
174:time:   9.844; rss:  143MB ->  143MB (   +0MB)      link_binary
175:time:   9.844; rss:  143MB ->  143MB (   +0MB)      link_crate
176:time:  10.830; rss:  143MB ->  143MB (   +0MB)      link
177:time:  13.903; rss:   29MB ->  102MB (  +73MB)      total
❯ rg '(link|total)' auraed-lld.txt
104:   Compiling netlink-packet-utils v0.5.1
110:   Compiling netlink-packet-core v0.4.2
131:   Compiling netlink-sys v0.8.4
147:   Compiling netlink-proto v0.10.0
152:   Compiling netlink-packet-route v0.13.0
165:   Compiling rtnetlink v0.11.0
219:time:   0.270; rss:  216MB ->  216MB (   +0MB)      link_binary_check_files_are_writeable
220:time:  19.198; rss:  216MB ->  216MB (   +0MB)      run_linker
221:time:   0.290; rss:  124MB ->  124MB (   +0MB)      link_binary_remove_temps
222:time:  19.767; rss:  216MB ->  124MB (  -92MB)      link_binary
223:time:  19.767; rss:  216MB ->  124MB (  -92MB)      link_crate
224:time:  21.418; rss:  281MB ->  124MB ( -157MB)      link
225:time:  27.115; rss:   29MB ->   95MB (  +67MB)      total
❯ rg '(link|total)' auraescript-lld.txt
200:time:   0.053; rss:  135MB ->  135MB (   +0MB)      link_binary_check_files_are_writeable
201:time:  16.251; rss:  135MB ->  135MB (   +1MB)      run_linker
202:time:   0.052; rss:  129MB ->  129MB (   +0MB)      link_binary_remove_temps
203:time:  16.358; rss:  135MB ->  129MB (   -6MB)      link_binary
204:time:  16.358; rss:  135MB ->  129MB (   -6MB)      link_crate
205:time:  16.706; rss:  135MB ->  129MB (   -6MB)      link
206:time:  19.145; rss:   29MB ->   95MB (  +67MB)      total

Linking time with mold:

❯ rg '(link|total)' aer-mold.txt
167:time:   0.173; rss:  143MB ->  143MB (   +0MB)      link_binary_check_files_are_writeable
168:time:   9.127; rss:  143MB ->  143MB (   +0MB)      run_linker
169:time:   0.206; rss:  113MB ->  113MB (   +0MB)      link_binary_remove_temps
170:time:   9.510; rss:  143MB ->  113MB (  -30MB)      link_binary
171:time:   9.510; rss:  143MB ->  113MB (  -30MB)      link_crate
172:time:  10.497; rss:  143MB ->  113MB (  -30MB)      link
173:time:  13.286; rss:   29MB ->  102MB (  +74MB)      total
❯ rg '(link|total)' auraed-mold.txt
104:   Compiling netlink-packet-utils v0.5.1
108:   Compiling netlink-packet-core v0.4.2
130:   Compiling netlink-sys v0.8.4
146:   Compiling netlink-proto v0.10.0
151:   Compiling netlink-packet-route v0.13.0
162:   Compiling rtnetlink v0.11.0
219:time:   0.275; rss:  213MB ->  213MB (   +0MB)      link_binary_check_files_are_writeable
220:time:  19.409; rss:  210MB ->  210MB (   +0MB)      run_linker
221:time:   0.339; rss:  120MB ->  120MB (   +0MB)      link_binary_remove_temps
222:time:  20.033; rss:  213MB ->  120MB (  -93MB)      link_binary
223:time:  20.033; rss:  213MB ->  120MB (  -93MB)      link_crate
224:time:  21.605; rss:  279MB ->  120MB ( -159MB)      link
225:time:  26.949; rss:   29MB ->   95MB (  +66MB)      total
❯ rg '(link|total)' auraescript-mold.txt
201:time:   0.054; rss:  135MB ->  135MB (   +0MB)      link_binary_check_files_are_writeable
202:time:  18.446; rss:  135MB ->  135MB (   +1MB)      run_linker
203:time:   0.064; rss:  135MB ->  135MB (   +0MB)      link_binary_remove_temps
204:time:  18.565; rss:  135MB ->  135MB (   +1MB)      link_binary
205:time:  18.565; rss:  135MB ->  135MB (   +1MB)      link_crate
206:time:  18.898; rss:  135MB ->  135MB (   +1MB)      link
207:time:  21.611; rss:   29MB ->   96MB (  +67MB)      total

An important note regarding linkers is that cargo will create a binary for each integration test file, which requires the whole linking rigamarole, for each test file. This is why I included the lld in the above cargo test timings. One way to circumvent a large portion of this problem is to use a single test binary, i.e., put all the tests in another directory.

Another important note is that I couldn't get mold to work properly with the integration tests. Needs further investigation.

Better caching

As I mentioned in the beginning, I don't quite understand how cargo caches the dependencies across a workspace, but at this point, I suspect it doesn't. We might be able to figure out a better keying method for our current caching mechanism in our CI, or swap to sccache after better understanding how all these caches work.

Putting it all together

This isn't in the script below.

Here's the timing after implementing the caching with sccache, linking via lld, and combining the integration tests (showing all 5s+ units):
image

Total time: 201.1s (3m 21.1s)

The "what," "why," and "when" in aurae's CI

A great constructive exercise we can do as part of this ticket is to discuss how we're using CI. I was watching Nova's stream when this ticket was made, and it seemed like the root problem was that Nova couldn't get a cargo check error in a timely manner. There's something to be said for having fast compilation times generally, but pragmatically, what exactly do we want from our CI, why do we want those things, and when do we want them?

cargo check is a great example to kick off this conversation:

  • What: As a developer, I want to check my code for potential complication errors without performing an entire build
  • Why: Performing an entire build will almost certainly be more expensive than doing a cargo check
  • When: Only on pull requests. If cargo check errors are in the main branch, something has gone wrong. In any case, a complete build will catch it.

Script

Assumes you have followed the aurae build instructions, have export RUSTC_WRAPPER=sccache in the shell you're running this script in, and have the following installed:

  • sccache
  • clang
  • lld
  • mold
progs=(
    "aer"
    "auraed"
    "auraescript"
)

linkers=(
    "lld"
    "mold"
)

echo "INFO: Running analysis"
mkdir -p profiling

echo "INFO: Cleaning"
make clean-gen clean-crates
echo "INFO: Finished cleaning"

echo "INFO: First run of cargo test --workspace --timings (no cache)"
cargo test --workspace --timings
mv target/cargo-timings/cargo-timing.html profiling/cargo-timing-no-cache.html
echo "INFO: Done with first run of cargo test"

echo "INFO: Second run of cargo test --workspace --timings (using cache)"
cargo test --workspace --timings
mv target/cargo-timings/cargo-timing.html profiling/cargo-timing-with-cache.html
echo "INFO: Done with second run of cargo test"

echo "INFO: Third run of cargo test --workspace --timings (using lld and cache)"
RUSTFLAGS="-Clinker=clang -Clink-arg=-fuse-ld=lld" cargo test --workspace --timings
mv target/cargo-timings/cargo-timing.html profiling/cargo-timing-lld-with-cache.html
echo "INFO: Done with third run of cargo test"

for prog in "${progs[@]}"; do
    echo "INFO: Running $prog + default"
    cargo +nightly rustc -p "$prog" --bins -- -Z time-passes &> "profiling/$prog-default.txt"
    echo "INFO: Done with $prog + default"
done
for linker in "${linkers[@]}"; do
    for prog in "${progs[@]}"; do
        echo "INFO: Running $prog + $linker"
        RUSTFLAGS="-Clinker=clang -Clink-arg=-fuse-ld=$linker" cargo +nightly rustc -p "$prog" --bins -- -Z time-passes &> "profiling/$prog-$linker.txt"
        echo "INFO: Done with $prog + $linker"
    done
done

echo "INFO: Results for timings with no cache can be found at $(pwd)/profiling/cargo-timing-no-cache.html"
echo "INFO: Results for timings with cache can be found at $(pwd)/profiling/cargo-timing-with-cache.html"
for prog in "${progs[@]}"; do
    echo "INFO: Results for $prog + default time-passes can be found at $(pwd)/profiling/$prog-default.txt"
done
for linker in "${linkers[@]}"; do
    for prog in "${progs[@]}"; do
        echo "INFO: Results for $prog + $linker time-passes can be found at $(pwd)/profiling/$prog-$linker.txt"
    done
done

echo "INFO: View in browser via file://$(pwd)/profiling"

@dmah42
Copy link
Contributor

dmah42 commented Jun 21, 2024

my builds are still slow, but i'm not sure what we can do. maybe we should think about breaking out the monorepo...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Good First Issue New to the project? Looking for something to get started with?
Projects
None yet
Development

No branches or pull requests

4 participants