Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return incompatibility from conflict & Allow backtracking to before a specific package #36

Closed
wants to merge 3 commits into from

Conversation

konstin
Copy link
Member

@konstin konstin commented Dec 12, 2024

Background reading: astral-sh/uv#8157

I'll link the PR on the uv side with all the details once it's up, it contains a walkthrough.

This adds two features to pubgrub:

Return incompatibility from conflict: Whenever we either discard a version due to its dependencies or perform conflict resolution, we return the last conflict that led to discarding them. In uv, we use this to re-prioritize and backtrack when a package decision accumulated to many conflicts. Using the incompatibilities directly from pubgrub makes this efficient.

Allow backtracking to before a specific package: Once we identified a pair of conflicting packages, we want to switch their priorities. This requires backtracking to before the decision for the earlier of the two packages was made. On the uv side, we change the priorities to make sure we take a different path on the next attempt. We allow attempting to backtrack on packages that were not decided yet to avoid the caller from making the duplicative check: At the time when the see incompatibilities caused by them, they are not necessarily decided.

Best reviewed commit-by-commit.

Copy link

codspeed-hq bot commented Dec 12, 2024

CodSpeed Performance Report

Congrats! CodSpeed is installed 🎉

🆕 6 new benchmarks were detected.

You will start to see performance impacts in the reports once the benchmarks are run from your default branch.

Detected benchmarks

  • backtracking_disjoint_versions (2.3 s)
  • backtracking_ranges (2 s)
  • backtracking_singletons (4.3 s)
  • large_case_u16_NumberVersion.ron (21.4 ms)
  • sudoku-easy (3.8 ms)
  • sudoku-hard (3.9 ms)

@konstin konstin changed the title Return Id<DP::P> over &ID<DP::P> Return incompatibility from conflict & Allow backtracking to before a specific package Dec 12, 2024
Whenever we either discard a version due to its dependencies or perform conflict resolution, we return the last conflict that led to discarding them. In uv, we use this to re-prioritize and backtrack when a package decision accumulated to many conflicts.
This allows discarding a previously made decision if it turned out to be a bad decision, even if all options with this decisions have not yet been rejected.

We allow attempting to backtrack on packages that were not decided yet to avoid the caller from making the duplicative check.
Allow introspecting which packages are to be decided. uv currently uses this for (trace) logging in the resolver.
@konstin konstin force-pushed the konsti/backtrack-package branch from 036aab4 to 251e93b Compare December 13, 2024 11:38
@konstin konstin closed this Dec 16, 2024
konstin added a commit that referenced this pull request Dec 16, 2024
This PR is the child of #36 and
pubgrub-rs#291, providing an
implementation that works for both cargo and uv. Upstream PR:
pubgrub-rs#298.

Specifically, we use the returned incompatibility in
astral-sh/uv#9843, but not
`PackageResolutionStatistics`.

---

Whenever we either discard a version due to its dependencies or perform
conflict resolution, we return the last conflict that led to discarding
them.

In cargo, we use this information for prioritization, which speeds up
resolution (`cargo run -r -- -m pub --with-solana --filter
solana-archiver-lib -t 16` goes from 90s to 20s on my machine).

Configurations that are noticeably slower for the solana test case:
* All incompatibilities unit propagation
* Only the last root cause in unit propagation
* No incompatibilities from unit propagation
* No incompatibilities from `add_version`
* Only affect counts (without culprit counts)
* Backtracking with the same heuristic as
astral-sh/uv#9843 (backtracking once after
affected hits 5)

In uv, we use this to re-prioritize and backtrack when a package
decision accumulated to many conflicts. Since we have our own solver
loop, we add the incompatibility to our own tracking instead.

Built on pubgrub-rs#291

## Benchmarks

Main:

```
        index commit hash: 82086e46740d7a9303216bfac093e7268a95121f
        index commit time: 2024-11-30T18:18:14Z
               index size: 32
          solana in index: 32
             Pub CPU time:  1215.49s ==  20.26min
           Cargo CPU time: skipped
Cargo check lock CPU time: skipped
  Pub check lock CPU time: skipped
                Wall time:    80.58s ==   1.34min
```

With pubgrub-rs#291:

```
        index commit hash: 82086e46740d7a9303216bfac093e7268a95121f
        index commit time: 2024-11-30T18:18:14Z
               index size: 32
          solana in index: 32
             Pub CPU time:   467.73s ==   7.80min
           Cargo CPU time: skipped
Cargo check lock CPU time: skipped
  Pub check lock CPU time: skipped
                Wall time:    34.76s ==   0.58min
```

This PR:

```
        index commit hash: 82086e46740d7a9303216bfac093e7268a95121f
        index commit time: 2024-11-30T18:18:14Z
               index size: 32
          solana in index: 32
             Pub CPU time:   271.79s ==   4.53min
           Cargo CPU time: skipped
Cargo check lock CPU time: skipped
  Pub check lock CPU time: skipped
                Wall time:    20.17s ==   0.34min
```
konstin added a commit to astral-sh/uv that referenced this pull request Dec 16, 2024
Background reading: #8157
Companion PR: astral-sh/pubgrub#36
Requires for test coverage: astral-sh/packse#230

When two packages A and B conflict, we have the option to choose a lower
version of A, or a lower version of B. Currently, we determine this by
the order we saw a package (assuming equal specificity of the
requirement): If we saw A before B, we pin A until all versions of B are
exhausted. This can lead to undesirable outcomes, from cases where it's
just slow (sentry) to others cases without lower bounds where be
backtrack to a very old version of B. This old version may fail to build
(terminating the resolution), or it's a version so old that it doesn't
depend on A (or the shared conflicting package) anymore - but also is
too old for the user's application (fastapi). #8157 collects such cases,
and the `wrong-backtracking` packse scenario contains a minimized
example.

We try to solve this by tracking which packages are "A"s, culprits, and
"B"s, affected, and manually interfering with project selection and
backtracking. Whenever a version we just chose is rejected, we give the
current package a counter for being affected, and the package it
conflicted with a counter for being a culprit. If a package accumulates
more counts than a threshold, we reprioritize: Undecided after the
culprits, after the affected, after packages that only have a single
version (URLs, `==<version>`). We then ask pubgrub to backtrack just
before the culprit. Due to the changed priorities, we now select package
B, the affected, instead of package A, the culprit.

To do this efficiently, we ask pubgrub for the incompatibility that
caused backtracking, or just the last version to be discarded (due to
its dependencies). For backtracking, we use the last incompatibility
from unit propagation as a heuristic. When a version is discarded
because one of its dependencies conflicts with the partial solution, the
incompatibility tells us the package in the partial solution that
conflicted.

We only backtrack once per package, on the first time it passes the
threshold. This prevents backtracking loops in which we make the same
decisions over and over again. But we also changed the priority, so that
we shouldn't take the same path even after the one time we backtrack (it
would defeat the purpose of this change).

There are some parameters that can be tweaked: Currently, the threshold
is set to 5, which feels not too eager with so me of the conflicts that
we want to tolerate but also changes strategies quickly. The relative
order of the new priorities can also be changed, as for each (A, B) pair
the priority of B is afterwards lower than that for A. Currently,
culprits capture conflict for the whole package, but we could limit that
to a specific version. We could discard conflict counters after
backtracking instead of keeping them eternally as we do now. Note that
we're always taking about pairs (A, B), but in practice we track
individual packages, not pairs.

A case that we wouldn't capture is when B is only introduced to the
dependency graph after A, but I think that would require cyclical
dependency for A and B to conflict? There may also be cases where
looking at the last incompatibility is insufficient.

Another example that we can't repair with prioritization is
urllib3/boto3/botocore: We actually have to check all the newer versions
of boto3 and botocore to identify the version that allows with the older
urllib3, no shortcuts allowed.

```
urllib3<1.25.4
boto3
```

All examples I tested were cases with two packages where we only had to
switch the order, so I've abstracted them into a single packse case.

This PR changes the resolution for certain paths, and there is the risk
for regressions.

Fixes #8157

---

All tested examples improved.

Input fastapi:
```text
starlette<=0.36.0
fastapi<=0.115.2
```

```
# BEFORE
$ uv pip --no-progress compile -p 3.11 --exclude-newer 2024-10-01 --no-annotate debug/fastapi.txt
annotated-types==0.7.0
anyio==4.6.0
fastapi==0.1.17
idna==3.10
pydantic==2.9.2
pydantic-core==2.23.4
sniffio==1.3.1
starlette==0.36.0
typing-extensions==4.12.2

# AFTER
$ cargo run --profile fast-build --no-default-features pip compile -p 3.11 --no-progress --exclude-newer 2024-10-01 --no-annotate debug/fastapi.txt 
annotated-types==0.7.0
anyio==4.6.0
fastapi==0.109.1
idna==3.10
pydantic==2.9.2
pydantic-core==2.23.4
sniffio==1.3.1
starlette==0.35.1
typing-extensions==4.12.2
```


Input xarray:
```text
xarray[accel]
```

```
# BEFORE
$ uv pip --no-progress compile -p 3.11 --exclude-newer 2024-10-01 --no-annotate debug/xarray-accel.txt
bottleneck==1.4.0
flox==0.9.13
llvmlite==0.36.0
numba==0.53.1
numbagg==0.8.2
numpy==2.1.1
numpy-groupies==0.11.2
opt-einsum==3.4.0
packaging==24.1
pandas==2.2.3
python-dateutil==2.9.0.post0
pytz==2024.2
scipy==1.14.1
setuptools==75.1.0
six==1.16.0
toolz==0.12.1
tzdata==2024.2
xarray==2024.9.0

# AFTER
$ cargo run --profile fast-build --no-default-features pip compile -p 3.11 --no-progress --exclude-newer 2024-10-01 --no-annotate debug/xarray-accel.txt
bottleneck==1.4.0
flox==0.9.13
llvmlite==0.43.0
numba==0.60.0
numbagg==0.8.2
numpy==2.0.2
numpy-groupies==0.11.2
opt-einsum==3.4.0
packaging==24.1
pandas==2.2.3
python-dateutil==2.9.0.post0
pytz==2024.2
scipy==1.14.1
six==1.16.0
toolz==0.12.1
tzdata==2024.2
xarray==2024.9.0
```


Input sentry: The resolution is identical, but arrived at much faster:
main tries 69 versions (sentry-kafka-schemas: 63), PR tries 12 versions
(sentry-kafka-schemas: 6; 5 times conflicting, then once the right
version).

```text
python-rapidjson<=1.20,>=1.4
sentry-kafka-schemas<=0.1.113,>=0.1.50
```

```
# BEFORE
$ uv pip --no-progress compile -p 3.11 --exclude-newer 2024-10-01 --no-annotate debug/sentry.txt
fastjsonschema==2.20.0
msgpack==1.1.0
python-rapidjson==1.8
pyyaml==6.0.2
sentry-kafka-schemas==0.1.111
typing-extensions==4.12.2

# AFTER
$ cargo run --profile fast-build --no-default-features pip compile -p 3.11 --no-progress --exclude-newer 2024-10-01 --no-annotate debug/sentry.txt
fastjsonschema==2.20.0
msgpack==1.1.0
python-rapidjson==1.8
pyyaml==6.0.2
sentry-kafka-schemas==0.1.111
typing-extensions==4.12.2
```


Input apache-beam
```text
# Run on Python 3.10
dill<0.3.9,>=0.2.2
apache-beam<=2.49.0
```

```
# BEFORE
$ uv pip --no-progress compile -p 3.10 --exclude-newer 2024-10-01 --no-annotate debug/apache-beam.txt
  × Failed to download and build `apache-beam==2.0.0`
  ╰─▶ Build backend failed to determine requirements with `build_wheel()` (exit status: 1)

# AFTER
$ cargo run --profile fast-build --no-default-features pip compile -p 3.10 --no-progress --exclude-newer 2024-10-01 --no-annotate debug/apache-beam.txt
apache-beam==2.49.0
certifi==2024.8.30
charset-normalizer==3.3.2
cloudpickle==2.2.1
crcmod==1.7
dill==0.3.1.1
dnspython==2.6.1
docopt==0.6.2
fastavro==1.9.7
fasteners==0.19
grpcio==1.66.2
hdfs==2.7.3
httplib2==0.22.0
idna==3.10
numpy==1.24.4
objsize==0.6.1
orjson==3.10.7
proto-plus==1.24.0
protobuf==4.23.4
pyarrow==11.0.0
pydot==1.4.2
pymongo==4.10.0
pyparsing==3.1.4
python-dateutil==2.9.0.post0
pytz==2024.2
regex==2024.9.11
requests==2.32.3
six==1.16.0
typing-extensions==4.12.2
urllib3==2.2.3
zstandard==0.23.0
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant