Evaluate using Profile-Guided Optimization (PGO) and Post-Link Optimization (PLO) #953
Replies: 1 comment 2 replies
-
Thanks a lot for the suggestion and all the information! I heard about PGO before, but didn't know the results would be this impressive (the VM runtime of Fibonacci in the results you linked are about 30 % faster) and will give your article a read soon. So far, AFAICT most users of Candy are working on the Candy compiler itself, so they need frequent (and therefore fast) recompilations. From my understanding, PGO would significantly increase the compilation time (since the training and optimization phases are added), so it would be more beneficial for binary releases of the Candy compiler that people only writing Candy code can use. So we should definitely use PGO in our release workflows (as soon as we have them). Or does it also make sense to run the training phase once locally and then reuse that data for multiple compilations with PGO of the Candy compiler while working on it? |
Beta Was this translation helpful? Give feedback.
-
Hi!
Recently I checked Profile-Guided Optimization (PGO) improvements on multiple projects. The results are available here. According to the tests, PGO can help with achieving better performance in many cases, including compilers and interpreters. Since this, I think trying to optimize Candy lang tooling with PGO can be a good idea.
I already did some benchmarks on Candy "compiler/vm" and want to share my results.
Test environment
main
branch on commit7f99bf4e4202967bf341286e1b2ed71e4f81afc1
Benchmark
For benchmark purposes, I use these benchmarks. For PGO optimization I use cargo-pgo tool. Release bench result I got with
cargo bench
command. PGO training phase is done withcargo pgo bench
, PGO optimization phase - withcargo pgo optimize bench
.Results
I got the following results:
According to the results, PGO measurably improves Candy VM performance in many cases.
Further steps
I can suggest the following action points:
Testing Post-Link Optimization techniques (like LLVM BOLT) would be interesting too (Clang and Rustc already use BOLT as an addition to PGO) but I recommend starting from the usual PGO.
Here are some examples of how PGO optimization is integrated into other projects:
configure
scriptI would be happy to answer your questions about PGO and PLO.
Beta Was this translation helpful? Give feedback.
All reactions