perf: reintroduce binary encoding of forbid clauses #91

baszalmstra · 2024-12-31T15:46:17Z

This PR reintroduces binary encoding of forbid multiple instance clauses previously introduced in #37.

Description

The PR is split into two, the first commit removes RefCell usage by splitting some functions that take mutable arguments out of the Solver struct. The second commit introduces binary encoding of the clauses.

Part of this refactor is the VariableMap. The solver no longer uses solvables as the variables in the SAT process but instead a new type VariableId is introduced. A variable is allocated for each encountered solvable but variables can also represent other semantic concepts. This is used in binary encoding which requires additional helper variables. The VariableMap both allocates the variables and also retains the semantic origin of the variables which is useful when reporting conflicts or for logging purposes.

Just like in #37 there are still some issues with reporting. I have concluded that this is more a bug in how we construct the reports than an issue with binary encoding itself. I will attempt to fix this in a separate PR. The tests of rattler still succeed but the reporting is formatted slightly differently.

I choose to implement binary encoding over other encodings because of the way we construct the clauses in resolvo and its simplicity. Binary encoding requires only slight modification to the SAT rules whereas other encodings often require multiple different SAT rules. We also construct the rules iteratively, so we don't know the number of candidates upfront.

Performance

There are two areas in which we observe an improvement with this PR: memory usage and overall performance. Binary encoding only allocates O(n log2(n)) clauses compared to the O(n^2) in the current implementation. When there are a large number of candidates this saves an enormous amount of memory. As an example: pyarrow can have 4000+ candidates, we would allocate 16.000.000+ clauses which quickly adds up! With binary encoding this is reduced to roughly just 50.000. Given that a clause uses roughly 24 bytes of memory that reduces the memory usage by 365MB for just pyarrow alone.

The significantly lower number of clauses also require less operations when propagating decisions through the solver, improving the overall performance of the solver. As can be seen in the benchmarks below 80% of the benchmark cases can now be solved in less than a second (compared to the 58% currently on main).

ruben-arts · 2025-01-06T08:39:51Z

Additional information out of the datasets:

Summary for base_timings.csv:
- Average Solve Time: 1.40 seconds (mean of all durations)
- Median Solve Time: 0.47 seconds (middle value when sorted)
- Standard Deviation: 3.26 seconds (spread of durations around the mean)
- Minimum Solve Time: 0.00 seconds (shortest solve duration)
- Maximum Solve Time: 50.00 seconds (longest solve duration, capped at 50s)
- 25th Percentile: 0.14 seconds (25% of solves were faster than this)
- 75th Percentile: 2.03 seconds (75% of solves were faster than this)
- Number of Outliers: 3 (solves capped at 50s)

Summary for timings.csv:
- Average Solve Time: 0.83 seconds (mean of all durations)
- Median Solve Time: 0.35 seconds (middle value when sorted)
- Standard Deviation: 2.95 seconds (spread of durations around the mean)
- Minimum Solve Time: 0.00 seconds (shortest solve duration)
- Maximum Solve Time: 50.00 seconds (longest solve duration, capped at 50s)
- 25th Percentile: 0.11 seconds (25% of solves were faster than this)
- 75th Percentile: 0.89 seconds (75% of solves were faster than this)
- Number of Outliers: 3 (solves capped at 50s)

Comparison between the datasets:
- Average Solve Time: 'timings.csv' was 1.68 times faster than 'base_timings.csv'
- Median Solve Time: 'timings.csv' was 1.33 times faster than 'base_timings.csv'
- 25th Percentile: 'timings.csv' was 1.22 times faster than 'base_timings.csv'
- 75th Percentile: 'timings.csv' was 2.28 times faster than 'base_timings.csv'
- Outliers: 'timings.csv' had 0 fewer solves capped at 50s

wolfv · 2025-01-06T08:50:44Z

Nice data-sciencing @ruben-arts ! :) Those timings look great

baszalmstra added 2 commits December 31, 2024 12:35

refactor: remove refcell usage

2c20a6c

refactor: introduce binary encoding of at most one constraints

9627cb5

baszalmstra assigned wolfv Dec 31, 2024

baszalmstra mentioned this pull request Jan 1, 2025

refactor: binary encoding of at most one clauses #37

Closed

Merge branch 'main' into refactor/variable_map

401772c

baszalmstra mentioned this pull request Jan 1, 2025

High memory usage #86

Open

ruben-arts mentioned this pull request Jan 2, 2025

Pixi eats all my memory and then dies prefix-dev/pixi#2458

Open

2 tasks

baszalmstra added 2 commits January 2, 2025 13:18

Merge remote-tracking branch 'upstream/main' into refactor/variable_map

ac092e0

add brackets

a5e4e3e

baszalmstra merged commit f5b1b73 into mamba-org:main Jan 2, 2025
13 checks passed

baszalmstra mentioned this pull request Jan 2, 2025

chore: release v0.8.5 #83

Merged

BrewTestBot mentioned this pull request Jan 6, 2025

pixi 0.39.5 Homebrew/homebrew-core#203323

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: reintroduce binary encoding of forbid clauses #91

perf: reintroduce binary encoding of forbid clauses #91

baszalmstra commented Dec 31, 2024

ruben-arts commented Jan 6, 2025

wolfv commented Jan 6, 2025

perf: reintroduce binary encoding of forbid clauses #91

perf: reintroduce binary encoding of forbid clauses #91

Conversation

baszalmstra commented Dec 31, 2024

Description

Performance

ruben-arts commented Jan 6, 2025

wolfv commented Jan 6, 2025