perf: reintroduce binary encoding of forbid clauses #91
+1,335
−809
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR reintroduces binary encoding of forbid multiple instance clauses previously introduced in #37.
Description
The PR is split into two, the first commit removes
RefCell
usage by splitting some functions that take mutable arguments out of theSolver
struct. The second commit introduces binary encoding of the clauses.Part of this refactor is the
VariableMap
. The solver no longer uses solvables as the variables in the SAT process but instead a new typeVariableId
is introduced. A variable is allocated for each encountered solvable but variables can also represent other semantic concepts. This is used in binary encoding which requires additional helper variables. TheVariableMap
both allocates the variables and also retains the semantic origin of the variables which is useful when reporting conflicts or for logging purposes.Just like in #37 there are still some issues with reporting. I have concluded that this is more a bug in how we construct the reports than an issue with binary encoding itself. I will attempt to fix this in a separate PR. The tests of
rattler
still succeed but the reporting is formatted slightly differently.I choose to implement binary encoding over other encodings because of the way we construct the clauses in resolvo and its simplicity. Binary encoding requires only slight modification to the SAT rules whereas other encodings often require multiple different SAT rules. We also construct the rules iteratively, so we don't know the number of candidates upfront.
Performance
There are two areas in which we observe an improvement with this PR: memory usage and overall performance. Binary encoding only allocates
O(n log2(n))
clauses compared to theO(n^2)
in the current implementation. When there are a large number of candidates this saves an enormous amount of memory. As an example:pyarrow
can have 4000+ candidates, we would allocate 16.000.000+ clauses which quickly adds up! With binary encoding this is reduced to roughly just 50.000. Given that a clause uses roughly 24 bytes of memory that reduces the memory usage by 365MB for justpyarrow
alone.The significantly lower number of clauses also require less operations when propagating decisions through the solver, improving the overall performance of the solver. As can be seen in the benchmarks below 80% of the benchmark cases can now be solved in less than a second (compared to the 58% currently on main).