Replies: 1 comment
-
Learned a lot from the article. Meanwhile, I think |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Recently, I've been researching the incremental implementation of Rspack. A paper titled "Build Systems à la Carte: Theory and Practice" has been mentioned in many materials about incremental builds implemented by other compilers. So I took some time to study it and found it quite interesting. It also has some relevance to bundlers. This article will briefly introduce the content of this paper and attempt to summarize bundlers from the perspective of build systems.
Build system
A build system refers to a software system that automatically executes a series of repeatable tasks. Common ones include Make, Shake, and Bazel. They take source files as input and execute tasks according to task description files (such as makefile) to build executable files.
There are also some less common ones. Excel takes cells as input, regards the formulas in specified cells as tasks and executes them to build the results of these cells. UI frameworks take props as input, regard Components as tasks and execute them to build new UI.
From this, we can identify some common concepts:
These concepts are quite universal, and their implementations in various build systems are relatively similar. They are not the main reasons for the differences among different build systems. The main reasons for the differences among various build systems are actually caused by the different strategies adopted for the following two points:
These two points correspond to two relatively important concepts respectively: Rebuilder and Scheduler. Different build systems can be regarded as combinations of different Rebuilders and different Schedulers.
Scheduler
It holds a Rebuilder and conducts a new Build, determining in what order to execute Tasks.
Rebuilder
It holds a Task and re-executes the Task, determining whether the Task needs to be re-executed and whether to use the cache or the result of re-execution.
Build Systems
Build systems can be regarded as combinations of different Rebuilders and different Schedulers.
First, let's introduce several common features:
Make
make = topological modTimeRebuilder
Make uses makefile to describe tasks. The dependency relationships among these tasks are clear, which belong to static dependencies and do not support circular dependencies. Therefore, Make uses a topological scheduler to execute tasks in topological order.
The build information (Info) of Make is actually the file system itself. The file system has file modification times. Make judges whether a task needs to be re-executed by the file modification time. If the modification time of a file is earlier than that of its dependent files, it indicates that the task needs to be re-executed. Make treats the file modification time as a dirty bit, which is a kind of dirty bit rebuilder.
Certainly, in many cases, the file modification time is not reliable. For example, some programs will update the file modification time while the actual content of the file will not be changed. This leads to unnecessary re-execution of tasks.
Make achieves Minimality through modTimeRebuilder by skipping tasks that do not need to be executed. However, because of modTimeRebuilder, it fails to achieve Early cutoff. Because when a task is re-executed and outputs a new file, although the content has not changed, the file modification time has also been changed, resulting in the inability to interrupt early. It can also be seen from this that the tasks that are executed without achieving Early cutoff are definitely not the fewest. Therefore, Minimality is often relative.
Excel
excel = restarting dirtyBitRebuilder
Excel describes tasks through formulas in cells. Some formulas have static dependency relationships, while others are dynamic. Therefore, it uses a restarting scheduler to execute tasks. It is worth noting that Excel records the final execution order for reference in the next build to reduce the overhead of restarting.
Excel uses a dirty bit rebuilder. Cells modified by users are marked as dirty, and tasks that depend on these cells are re-executed. For formulas that result in dynamic dependencies, Excel marks them as dirty in each build to ensure that they are updated every time to guarantee correctness, sacrificing some performance to ensure its correctness.
Excel achieves Minimality for static dependencies but does not achieve Minimality for dynamic dependencies.
Bazel
bazel = restarting ctRebuilder
Bazel also uses a restarting scheduler to execute tasks, and it has an optimization mechanism to avoid the overhead of restarting.
Bazel uses ctRebuilder to support cloud caching and remote task execution.
Shake
shake = suspending vtRebuilder
Shake uses vtRebuilder. When tasks are being executed, it tracks the dependencies of tasks and records them. When executing tasks next time, if the dependencies haven't changed, it skips the execution.
Moreover, if the current task hasn't been executed, tasks that depend on the current task don't need to be executed either since their dependencies haven't changed, thus achieving Minimality and Early cutoff.
Since Shake tracks dependencies when tasks are being executed and doesn't need to define them statically in advance, it also supports Dynamic dependencies.
Cloud Shake
cloudShake = suspending ctRebuilder
Cloud Shake supports cloud caching on the basis of Shake. The difference lies in that the Rebuilder is changed from vtRebuilder to ctRebuilder.
Buck2
buck2 = suspending ctRebuilder
One of the core developers of Buck2 is the author of Shake and also one of the authors of the paper "Build Systems à la Carte: Theory and Practice".
Buck2 is similar to Cloud Shake. Buck2 supports dynamic dependencies, achieves minimality and early cutoff. Besides, it also supports cloud caching and natively supports remote task execution.
Buck2 has also implemented its own incremental computation engine: DICE.
Bundlers
A bundler can actually be understood as a build system plus a part of the task descriptor. In fact, the build system doesn't care about what specific tasks do. What specific tasks do is provided by users through task description files, and the build system only takes care of executing tasks. Early task runners like Gulp and Grunt were actually closer to build systems. Developers used these task runners to manually arrange the processing logic of files and took the task runners as build systems. Similarly, Turborepo doesn't care about the task logic but only executes tasks, and it also claims to be a build system.
The bundler itself describes a part of the task logic, such as how to build modules, how to split chunks, and how to perform optimizations, etc. Then the remaining parts are provided by user configurations and plugins, and they are combined to form a complete task descriptor.
There are also some differences between the tasks of the bundler and the build system:
In addition, if we take the Build defined in the build system as the standard, the Build of the bundler is actually divided into two types:
These two types of Build also result in two different kinds of Info, namely memory cache and persistent cache. These two kinds of Info can not only be used separately but also be mixed and used according to specific scenarios.
Webpack/Parcel/Rollup/esbuild
passBasedBundler = foreach ctRebuilder
In traditional pass-based bundlers, both the execution order (Scheduler) of tasks and whether to execute them (Rebuilder) are different in each pass. Each pass uses the task execution order and the execution strategy suitable for this stage according to the task logic of this stage. For example, in webpack:
In pass-based bundlers, the cache realizes Minimality for the bundler. However, since the tasks among different passes are unaware of each other, the tasks between passes cannot achieve Early cutoff, resulting in excessive tasks that still need cache verification. This is often the reason why pass-based bundlers are slow: the failure to achieve Early cutoff leads to a lack of Minimality.
Turbopack
turbopack = suspending ctRebuilder
Unlike traditional pass-based bundlers, Turbopack doesn't emphasize individual compilation stages (passes) from start to finish. Instead, it's closer to query-based. It defines tasks and obtains task results through queries. Especially in a development (Dev) environment, for example, when compiling a web page with
index.js
as the entry point, the logic of Turbopack is:The logic of the traditional pass-based bundler is:
Compared to pass-based bundlers, Turbopack will only focus on the part of tasks that need to be executed to obtain the query results, and other irrelevant tasks will not be executed. Especially in the Dev environment, there will not be a complete ModuleGraph and ChunkGraph. In the Production environment, some methods will still be used to aggregate into a complete graph to perform global optimizations on the complete ModuleGraph / ChunkGraph.
The underlying incremental computation engine of Turbopack, namely turbo tasks, is the build system that drives Turbopack. Concepts of the build system such as task, scheduler, and rebuilder are all implemented in turbo tasks. The upper layer of Turbopack is equivalent to describing the specific tasks of the bundler on the basis of turbo tasks. From this perspective, the incremental computation engine itself is actually a kind of build system. Similarly, Buck2, which is also based on the incremental computation engine DICE, is similar. DICE has already covered the core functions in the build system, and Buck2 implements the execution of tasks described by users as tasks of DICE on its basis.
Turbopack is uniformly based on turbo tasks as a whole and uses the combination of suspending and ctRebuilder to achieve overall Minimality and Early cutoff.
Vite
vite = suspending vtRebuilder
Although Vite itself doesn't perform bundling, Vite will still continuously execute tasks during development, which conforms to the definition of a build system. Vite doesn't package multiple modules but compiles individual modules instead. So the task logic of Vite is actually quite simple, that is, to compile modules. Vite compiles a module only when the browser makes a request for it. A request will be initiated only when the browser doesn't hit the cache. The order of the requests is the order of module imports, which is also determined by the browser. So it can be seen that Vite utilizes the browser's ESM module system as part of its own build system, belonging to the combination of suspending and vtRebuilder.
Utilizing the browser's ESM module system will make its own implementation much simpler, but the browser's ESM module system itself isn't implemented with the goal of being a build system. Compared to a real build system, it will bring many limitations, such as:
Rspack
incrementalRspack = foreach dirtyBitAndCtRebuilder
Rspack itself also belongs to the pass-based bundler. However, in order to optimize the performance of Hot Module Replacement (HMR) from O(project) to O(change), Rspack has introduced affected-based incremental. Briefly speaking, affected-based incremental will collect changes in various stages, and subsequent stages will calculate the tasks that may be affected based on the collected changes, so that only these affected tasks will be re-executed, reducing the number of task executions.
From the perspective of the build system, affected-based incremental is actually introducing a new Rebuilder on the basis of the original build system of the pass-based bundler, enabling tasks among different stages to be aware of each other through the collected changes, so that Early cutoff can be done for tasks in subsequent stages. By adding the feature of Early cutoff, Rspack can be more Minimality. This approach is closer to self-adjusting computation:
Find the affected inputs according to the changes and re-execute the corresponding tasks as dirty inputs. This implementation is less intelligent compared to incremental computation, but it is a relatively simple and effective way.
Summary
Many bundlers have claimed to be the next-generation bundlers. However, from the perspective of the task execution of the underlying build systems, most of them are basically no different and lack many excellent features that have existed in build systems for a long time. Many of these excellent features can be incorporated into bundlers:
Beta Was this translation helpful? Give feedback.
All reactions