bzl-gen-build is a polyglot modular BUILD generator for Bazel, implemented in Rust. It is designed so it is easy to pick and choose components, modify intermediate states (all JSON), and post-process or use the outputs for new purposes as it makes sense.
bzl-gen-build works by running the following phases for each file types:
- Extract classes/entities defined in source files and 3rdparty binaries
- Extract
import
statements and bzl-gen-build directives in comments - Build graph of usages and definition sites
- Generate Bazel targets
So far we have support for:
- Scala
- Java
- Python
- Protobuf
See example/
directory for the complete setup.
./build_tools/lang_support/create_lang_build_files/delete_build_files.sh
./build_tools/lang_support/create_lang_build_files/regenerate_protos_build_files.sh
./build_tools/lang_support/create_lang_build_files/regenerate_python_build_files.sh
bazel test ...
These scripts are calling:
GEN_FLAVOR=protos
source "$REPO_ROOT/build_tools/lang_support/create_lang_build_files/bzl_gen_build_common.sh"
run_system_apps "build_tools/lang_support/create_lang_build_files/bazel_${GEN_FLAVOR}_modules.json" \
--no-aggregate-source \
--append
The script is setup to download bzl-gen-build from GitHub and run the bzl-gen-build system-driver-app
with the appropriate commands.
Sometimes we need to help out the bzl-gen-build by supplying directives. For example you might need additional dependencies for tests or macros. We support a few flavors of directives, both in the Module Configuration files and inline in the source code.
// bzl_gen_build:runtime_ref:com.example.something.DefaultSomethingConfiguration
class SomeTest {
}
These are applied locally to the files they are applied against. These can alter the outcome/behavior of what the extract
command above will have produced into the system.
ref
, This adds a reference as if the current file referred to this entityunref
, Remove a reference from this file, the extractor might believe this file depends on this reference, but filter it outdef
, Add a new definition that we can forcibly say comes from this file. Using this can either manually or via another tool allow for the production of new types by either scala macros or java plugins.undef
, Remove a definition from this file so it won't be seen as coming from here in the graphruntime_ref
, Add a new runtime definition, since things only needed at runtime cannot usually be seen from the source code these can help indicate types/sources needed to run this in tests/deployment.runtime_unref
, the dual of the above, though generally not really used often
These are used to try to build extra links into the chain of dependencies.
link
, This has the form of connecting one entity to several others. That is if targetA
depends oncom.foo.Bar
, and a link exists connectingcom.foo.Bar
tocom.animal.Cat, com.animal.Dog
. Then when we seecom.foo.Bar
as a dependency of any target, such asA
, it will act as if it also depends onCat
and `Dog.
These directives are used as late applying commands, they will alter the final printed build file, but not be considered in graph resolution.
manual_runtime_ref
, add a runtime dependency on the target given. That is, not an entity but an actual target addressable in the build.manual_ref
, add a compile-time dependency on the target given. That is, not an entity but an actual target addressable in the build.
Today there is only a single form of this, though more though probably should go into this. And if it should merge with the manual directives above. This is used to generate binary targets.
binary_generate: binary_name[@ target_value]
, This will generate a binary calledbinary_name
, and optionally we pass in some information (such as a jvm class name), to the rule that generates the binary.
These run against target languages to generate a set of:
- classes/entities defined in a given language file(or files).
- classes/entities referred to by a given language file(or files).
- Inline directives in that language's comment format to be expressed to the system. (more details below on the directives)
This is an application that runs in multiple modes to try to connect together phases of the pipeline. You can run some, massage/edit/change the data, and run more as it makes sense.
This mode is to prepare the inputs to the system, it will run + cache the outputs of using the extractors mentioned above to pull out the definitions, references, and directives. It can also optionally take a set of generated external files already built of this format - this is mostly used to account for running an external system to figure out 3rdparty definitions/references. (In Bazel, this often would be an aspect).
This is a relatively simple app and maybe should be eliminated in the future. But its goal is to take the outputs from extract
and trim to a smaller number (collapsing up a tree) of files containing just definitions. We do this so in future phases when we need to load everything we can get all our definitions first to trim out all the files as they are being loaded. Scala/Java can have a lot of references as they are often heuristic-based when we have limited insights (wildcard imports).
This system is to resolve all of the links between the graph. This will collapse nodes together which have circular dependencies between them to a common ancestor. The output will contain all of the final nodes, along with which sets of source nodes were collapsed into them, and their dependencies.
This will print out all of the build files, performing any last application of directives as necessary
Each target language is configured using a JSON file, for example build_tools/lang_support/create_lang_build_files/bazel_python_modules.json
:
{
"configurations": {
"python": {
"file_extensions": ["py"],
"build_config": {
"main": {
"headers": [],
"function_name": "py_library",
"target_name_strategy": "source_file_stem"
},
"test": {
"headers": [],
"function_name": "py_test",
"target_name_strategy": "source_file_stem"
}
},
"main_roots": ["src/main/python"],
"test_roots": ["src/test/python"],
"path_directives": []
}
}
}
The above will parse all *.py
files under src/main/python/
and src/test/python/
and generate targets under the directories.
In some situations, like for Protocol Buffer schemas, we want to generate secondary rules per each primary rules. This can be configured as follows:
{
"configurations": {
"protos": {
"file_extensions": [
"proto"
],
"build_config": {
"main": {
"headers": [
{
"load_from": "@rules_proto//proto:defs.bzl",
"load_value": "proto_library"
}
],
"function_name": "proto_library"
},
"secondary_rules": {
"java": {
"headers": [],
"function_name": "java_proto_library",
"extra_key_to_list": {
"deps": [":${name}"]
}
},
"py": {
"headers": [
{
"load_from": "@rules_python//python:proto.bzl",
"load_value": "py_proto_library"
}
],
"function_name": "py_proto_library",
"extra_key_to_list": {
"deps": [":${name}"]
}
}
}
},
"main_roots": ["com"],
"test_roots": [],
"path_directives": []
}
}
}
Extracting definitions from 3rdparty libraries require some setup, also demonstrated in the example/
repo.
GEN_FLAVOR=python
source "$REPO_ROOT/build_tools/lang_support/create_lang_build_files/bzl_gen_build_common.sh"
set -x
bazel query '@pip//...' | grep '@pip' > $TMP_WORKING_STATE/external_targets
CACHE_KEY="$(generate_cache_key $TMP_WORKING_STATE/external_targets $REPO_ROOT/WORKSPACE $REPO_ROOT/requirements_lock_3_9.txt)"
rm -rf $TMP_WORKING_STATE/external_files &> /dev/null || true
# try_fetch_from_remote_cache "remote_python_${CACHE_KEY}"
# if [ ! -d $TMP_WORKING_STATE/external_files ]; then
# log "cache wasn't ready or populated"
bazel run build_tools/bazel_rules/wheel_scanner:py_build_commands -- $TMP_WORKING_STATE/external_targets $TMP_WORKING_STATE/external_targets_commands.sh
chmod +x ${TMP_WORKING_STATE}/external_targets_commands.sh
mkdir -p $TMP_WORKING_STATE/external_files
if [[ -d $TOOLING_WORKING_DIRECTORY ]]; then
BZL_GEN_BUILD_TOOLS_PATH=$TOOLING_WORKING_DIRECTORY ${TMP_WORKING_STATE}/external_targets_commands.sh
else
BZL_GEN_BUILD_TOOLS_PATH=$BZL_BUILD_GEN_TOOLS_LOCAL_PATH ${TMP_WORKING_STATE}/external_targets_commands.sh
fi
# update_remote_cache "remote_python_${CACHE_KEY}"
# fi
The above calls py_build_commands
to generate a Bash script, which then runs wheel_sanner on all the PIP libraries, and generates JSON files in a temp directory. In an actual usage, you would cache these based on the $CACHE_KEY
so this is done once per dependency update.