-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specify sanitizer parameters in CMake #76
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"We need to verify CMAKE_CXX_COMPILER_ID for g++ on macos is AppleClang."
Confirm what the compiler identification is for the default false g++ on Darwin is.
Marking "Request changes" so this doesn't get landed prematurely.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
So, yes: |
Not sure if picking the toolchain for Darwin better, or making the generic toolchain smarter would work better, but either should work? |
Wait, that can't work!
|
I am leaning on having separate tool chain files and having like a central tool chain dispatch logic based on compiler and platform. But then I realized there's not that much variance in building across platforms and compilers, at least in exemplar, to warrant separate files. |
That is why often I use the project_options |
That's project looks fantastic! I can bring this up in weekly sync and see if we want to use this. |
Ah I think tool chain file is executed before |
5f20499
to
117fcd9
Compare
a3f18b3
to
b000f40
Compare
@camio I just realized this is still needed for CI, I can move it under the CI script. Basically we would ideally want a CI script of this following matrix: compiler: [GNU, clang, vs code, MSVC] With this CMake script, we can let GitHub do it's mix and match without manually entering all the permutations. But if we do it from a purly command line interface, the CI script would be non-extendable. I see other projects generates the test matrix with a Python script as alternative, I don't think we are at that step yet. |
I agree that gcc-14 isn't important, but AppleClang, being used by 99% of developers on MacOS, is critically important for a preset and CI target.
Nothing here prevents people from testing clang-19 builds on MacOS, but being a very niche use case, I don't think it belongs in either CI or our presets file. |
What do you mean here? vs code isn't a compiler.
Since each compiler has its own set of flags and options, attempting to abstract the idea of a "configuration" that applies to some compilers, but not others, is going to be difficult to maintain. If we did want such a thing, I still argue it doesn't belong in our CMakeLists.txt files as the decider of which C++ flags to use is the invoker of these scripts, be that a preset, toolchain, or command-line invocation. I don't think I fully understand what problem you're attempting to solve. What, at the end of the day, would you like our CI jobs to be? |
Sorry about my unclear communication, I am very sleep deprived from finals.
Ah, I mean xcode here. Let me try to explain myself again. This tool is a start to provide a sanitizer compatibility layer for all compilers that we support in CI and preset. It's a mapping of Basically, all the compiler have different support for sanitizers, but these sanitizers could be generally grouped into two groups (tsan and asan), it would be clean to declare on GitHub Actions that I want to generate the permutation of [santizier set: [clang...], c++ version set: [17, 20, ...], sanitizer set (cmake args): [default, tsan-set, asan-set]] (e.g. combination: [clang, 17, asan-set]), and have a subsequent script that determines what the ASan set mean for each compilers (e.g. clang's asan set expands to Otherwise we will have to include all the combinations individually, which would lead to hell like: exemplar/.github/workflows/ci_tests.yml Lines 30 to 44 in d9145b5
And this is when there isn't a "different compiler need different command" scenario yet. You can already see the pain starting off in the MSVC CI PR. I have to include MSVC builds as extras (not part of the permutation), as a result, it only runs ASan on C++17 (while other compilers run ASan on [17..26]), otherwise the matrix would be:
For MSVC-ASan to cover C++ 17..26. This is 64 lines of CI code with high level of repetition to generate this matrix. Where 43 lines of code is added simply because MSVC use a different options interface and have a very restricted ASan support. This is not readable and thus highly error prone. This would be needed for apple clang as well once we add apple clang tests to CI because the set of sanitizer it supports is different. These combination would get worse if we want to add C++14 support as well. If we have a utility like what this PR provides the matrix would be simplified to:
Where the only addition needed is:
and
This will make MSVC not as much of an exception and create less noise. If we create a python script/ CI step that runs a bash script to generate what TSan/ ASan is, then we are essentially rewriting this PR in another language. But writing this as a CMake utility has an advantage as: This problem not only applies for the CI (if not especially applies to CI), but also presets. By adding a new compiler/ toolchain, we have to specify what ASan is as we want If someone updates CI/ preset without updating the other (maybe because the hundreds of lines of code in CI), we have a potential "it works on my machine but not on CI" issue or vice-versa. This would be potentially really hard to debug. I see the need to have a simple CMake infrastructure, and I think this utlility don't need to be included in the main CMake script. But I don't think the tradeoff of "having minimal CMake script and offload everything to CI" is reasonable in this specific instance. We have to deal with this complexity if we want to run sanitizers in CI, if we want to use the a declarative way to generate the matrix we will have to write a specific script to deal with sanitizer support, CMake script like what's included in this PR would be the most ideal way to implement. |
@wusatosi, thanks for this explanation. I think I understand what you're going for now.
XCode is also an IDE. I think when you say XCode, you're trying to refer to the native MacOS compiler, which is called AppleClang.
This grouping is not well-defined. You mention that ASan is enabled on GCC with the What if the grouping has a different shape? GCC's thread sanitizer is unique in that it may not be used with other sanitizers. Say we create a flag configuration named The next question is where to put these flag groupings. They don't belong in our What if we create per-compiler toolchains that make use of a "BEMAN_FLAG_SET" variable? The toolchain for AppleClang could look something like this: set(CMAKE_CXX_COMPILER /usr/bin/clang++)
set(CMAKE_C_COMPILER /usr/bin/clang)
if( "RUNTIME_INSTRUMENTATION" IN_LIST BEMAN_FLAG_SET )
list(APPEND CMAKE_CXX_FLAGS "-fsanitize=address" "-fsanitize=pointer-compare" "-fsanitize=pointer-subtract" "-fsanitize=leak" "-fsanitize=undefined")
# TODO: Error out if "MULTITHREAD_RUNTIME_INSTRUMENTATION" is also within BEMAN_FLAG_SET
endif()
if( "ENFORCED_WARNINGS" IN_LIST BEMAN_FLAG_SET )
list(APPEND CMAKE_CXX_FLAGS "-Wall" "-Wextra" "-Werror")
endif()
# ... A similar file is created for each of our supported compilers. CMake can then be invoked with options like I believe this would enable us to have a simple CI matrix. These toolchains can potentially live in a separate repository that is shared by the CI specifications for all libraries. (Aside: If our CI configurations check out a particular commit id of the toolchain repository, it would facilitate a gradual migration to toolchain improvements). A drawback of this approach is that our presets will repeat some of the information in our CI toolchain files. This is an acceptable tradeoff IMO to minimize the complexity of Beman build files. |
Clang's memory sanitizer also doesn't play well with other sanitizers. It also has the unfortunate property of needing to be globally applied, that is, all the libraries that touch memory need to be built with msan, just like tsan, and for essentially the same reason. |
By this grouping I mean to have "the minimum number of distinct sanitizer set that allow us to cover all the sanitizers". This was the original design goal for sanitizers on CI. Since thread sanitizer usually conflict with other sanitizers, this grouping comes down to TSan (thread santizier) and ASan (basically all other santiziers with the main interest being address sanitizer (also this is a good synonyms for All-other-santiziers?)). The documentation of these set is included in this PR. exemplar/cmake/apply_santizers.cmake Lines 3 to 6 in 55c966c
I want to point to the documentation I included in the PR again, the intention for sanitizer group is only to accommodating the fact that we cannot enable all sanitizers all at once. This is not an abstraction layer for general instrumentation based tooling. This is to optimize the common case that we want all the compiler sanitizers we can have, not to perfectly define "threading instrumentation", IMO this would be too much clutter for downstream. Let's say one day Address santizier doesn't work with undefined santiziers anymore, at that point (and my intention is to only at that point) we can include an extra set. In a sense, maybe I should just call all the santiziers groups "group a", "group b" instead of "ASan", "TSan" to avoid confusion.
Per-compiler Toolchain was the original intention behind this PR (or what was originally planned to come after this). In the simplistic world all the options generically works across all the compilers, I don't see enough variations across compilers to be used that warrants we creating various toolchain files in exemplar. I want to again point out that the intended audience for this utility here is only CI and preset. This is not intended to be public facing, I think it is obvious any specialization that will be needed downstream (aside from turning off some sanitizer) will need the contributor to create special infrastructure for their respective repo. But I believe the common case for projects are the default sanitizers are good enough without extra per-compiler configuration that deviates from the common Lines 32 to 33 in 55c966c
|
An important reason for keeping flags out of the core CI is to be buildable by package managers, which need to be in charge of the compilation and compilers being used in order to do their jobs. Different compilers with the same flags are sometimes as incompatible as the same compiler with different flags. If the package manager can supply its toolchain files and rely on us not messing with the request, it can mostly work out of the box. |
This PR proposes to add complexity to the top-level CMakeLists.txt and platform-specific flag selection. What I am proposing involves no changes to the top-level CMakeLists.txt file and no platform-specific flag selection. All complexity is moved to CI and the toolchains it uses. I disagree that my proposal goes against the minimal CMake approach. |
You may use a simpler solution like this: bemanproject/optional26#85 (comment) |
If that snippet you pointed to was included in the toolchain files and not included in the top-level CMakeLists.txt file, my concerns would be addressed. |
The toolchain file are not realy needed. The compiler may set on environment The Or you may use this kint of |
Okay I can try to go back to implement toolchain file |
@camio I implemented this using toolchain files, is this more what you are looking for? |
set(CMAKE_C_COMPILER gcc) | ||
set(CMAKE_CXX_COMPILER g++) | ||
|
||
if(BEMAN_BUILDSYS_SANITIZER STREQUAL "ASan") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modeling sanitizers as a build type works better, either that or an entirely distinct toolchain so Thread and Memory can be uniformly applied to all packages. If everything in an address space aren't using msan or tsan the reports they provide are broken, so you have to rebuild and relink the whole world consistently.
UB sanitizer and address, don't suffer the same problems.
So something like (not tested!):
set(CMAKE_CXX_FLAGS_ASAN
"${CMAKE_CXX_FLAGS_RELWITHDEBINFO} -fsanitize=address,undefined,leak"
CACHE STRING
"C++ ASAN Flags"
FORCE
)
Also at -O0 there's often no undefined behavior emitted for the sanitizer to see, for the same reasons that debug builds seg fault less often.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(optional26 doesn't use the _INIT variables because it's copied from ancient sources before that rule was clarified. Above should be using the *_INIT vars)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the write-up.
Separate ASAN as build targets seems a bit like overkill for the exemplars use case.
The design goal here isn't to have a full fledged instrumentation based analysis build system but just a quick hand for "enable all flags for sanitizers".
Given there's no dependency for exemplar, and the current recommendation for dependency management is to build with dependency's source code instead of including the dependent library at link time. I don't think there's value in complications here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remember, though, exemplar doesn't do anything. It's entire purpose is to serve as a starting point and reference point for further work. Everything we've done is entirely overkill for providing ... checks notes ... std::identity
.
Recommending building as part of the dependers source tree is a huge overstatement. We're making that possible, but it's still a terrible idea and does not scale to large systems. Getting to the point where we play well with package systems with public visibility is still on the todo list. (I haven't made it work with my internal one, but I know exactly how to.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see where u r coming from, I didn't think about exemplar as a dependency for other libraries (being a dependent) and were only commenting on its use of dependency and a standalone development library / CI test target.
I think you are right, there should be an ASAN target to produce an ASAN enabled library so someone could link us as a dependency to use. I get what you are talking about. But I think this is more of a package/ export issue, outside of scope for this PR for now and to be honest outside of my skill tree for now.
Again again again, the main motivation here is just to simply CI/ workflow.
Honestly I am tentatively waiting for someone to implement package export, do a quick write up, evaluate it and yonk it over (just like code coverage).
Could we delegate this suggestion to another PR? Let me know if I should add something/ structure this tool chain in anticipation of this feature.
Figuring out better ergonomics for handling sanitizers (and fuzzers, coverage, and the rest of the laundry list) can be ongoing work. Getting sanitizers in CI is an immediate improvement. I would base the sanitizers on the release or relwithdebinfo profile in CI, as debug tends to not exercise any of the runtime problems that the sanitizers detect. |
This PR adopts @bretbrownjr 's suggestion in #44 (review).
This PR introduces toolchain files for supported platforms:
Updated CI and preset to use the new toolchain files.
Race with #82