feat: qsort with proven bounds and correctness proof #5346

lyphyser · 2024-09-14T17:18:12Z

This changes qsort and qpartition to use array accesses that take proofs of the index being in-bound, and also makes qsort a total function by providing a proof of termination.

Assuming an efficient compiler, it should speed it up, and also it's the first step to writing a proof that qsort actually sorts the input (which requires totality and should be much easier if accesses are in-bounds).

I haven't tested performance though, but if this regresses performance, then the compiler needs to be improved.

Behavior should be the same, except for qpartition being changed when low/high are out of bounds, to clip high to the array size and return low if low >= (clipped)high, including when the array is empty where it would previously return 0 instead of low. This is necessary to make qsort terminate even with out-of-bounds inputs and make qpartion work correctly and efficiently even with out-of-bounds indices.

Of course, another approach could be to change the qpartition and qsort APIs to take Fins instead of Nats and/or inequality hypotheses, which would simplify the code, but breaks compatibility. A further potential option could be to move Vector to core from batteries and use it as the fundamental array type, reimplementing Array as a wrapper on top of Vector, which would also simplify the code since there would be no need to prove that every swap and set leaves the array size unchanged.

It also uncovers some problems with the split tactic, that seems to fail for no reason, requiring the manual by_cases+simp only combo instead in one case and a useless let in another case, and also with closure captures, where the "loop" closure seems to unfold hi and capture the variables of the resulting expression rather than hi itself, unless its definition is hidden with have as the patch does.

leanprover-community-mathlib4-bot · 2024-09-14T20:59:48Z

Mathlib CI status (docs):

❗ Batteries/Mathlib CI will not be attempted unless your PR branches off the nightly-with-mathlib branch. Try git rebase 5eea8355baa64e6fda4e3874a0f45649d4c27633 --onto f989520d2b17ec2fe987a69fee44a5272be1ff10. (2024-09-14 20:59:48)
✅ Mathlib branch lean-pr-testing-5346 has successfully built against this PR. (2024-09-16 02:05:47) View Log
💥 Mathlib branch lean-pr-testing-5346 build failed against this PR. (2024-09-16 11:38:19) View Log
💥 Mathlib branch lean-pr-testing-5346 build failed against this PR. (2024-09-17 19:04:47) View Log
💥 Mathlib branch lean-pr-testing-5346 build failed against this PR. (2024-09-18 09:18:44) View Log
💥 Mathlib branch lean-pr-testing-5346 build failed against this PR. (2024-09-18 10:59:52) View Log
💥 Mathlib branch lean-pr-testing-5346 build failed against this PR. (2024-09-18 15:25:51) View Log
💥 Mathlib branch lean-pr-testing-5346 build failed against this PR. (2024-09-18 18:18:13) View Log
❗ Batteries/Mathlib CI will not be attempted unless your PR branches off the nightly-with-mathlib branch. Try git rebase a6830f90ab365e14ccb7ca31201de37f8c1e978c --onto d8e0fa425b3225fc0c35c07247ecb11b49bb00ed. (2024-09-20 14:00:27)

Only breaks users if they specify bounds explicitly and omega can't solve the goals automatically.

Returning a pair unfortunately result in a heap allocation, while a specialized function call doesn't. However, this unfortunately seems to result in code duplication, so the next commit will completely remove qpartition and inline it into qsort, which fixes that.

This seems to produce the best C code, since it avoids both temporary allocations and code duplication of specialized functions

lyphyser · 2024-09-15T01:33:08Z

I made some major changes to the code, first with API changes to add bounds to the APi to remove remaining bounds checking code, then with a change that replaces returning a pair in qpartition with passing a lambda, to eliminate the temporary allocation for the pair and then the current version that completely inlines qpartition into qsort, that eliminates an unnecessary function and solves code duplication that would previously happen with specialization.

The current code generates C code that seems almost perfect, with the exception that loop should ideally be inlined into sort (but if I add @[inline], the Lean doesn't inline it and doesn't even specialize it anymore), which the C compiler might do on its own; and also that if that were done, the second recursive call in sort could be tail recursive (although the first is not tail recursive of course so it doesn't matter much).

The qpartition function is completely removed, which obviously would break any user, but there don't seem to be any in either Lean or Mathlib. If needed, the removed code for qpartition could be readded.

Obviously the commits should be squashed when merging, but I left them unsquashed so it's possible to see the history.

… high

digama0 · 2024-09-15T10:29:16Z

See also #3933

…perties) Complete, except for the fact that antisymmetry and transitivity should only be required for elements in the array range, not any value of the type. Also needs tidying and refactoring.

lyphyser · 2024-09-16T00:57:28Z

Added a proof of correctness as well, compiles and proves both that the output is ordered and that it is a permutation in the range and constant outside (as an inductive, not a bijection currently). Needs some tidying up though and improvements.

lyphyser · 2024-09-18T09:44:00Z

I added proofs for the special cases of lawful <= (transitive total) and < (weakly linear asymmetric), which completes the proofs, and properly split the file into many.

Now both the algorithm and proof should be in a mergeable shape.

…<= special cases This guarantees that our TransGen definition is not degenerate

kim-em · 2024-09-20T06:10:08Z

Could all the instances which are breaking the test become local instances?

kim-em · 2024-09-20T06:11:29Z

Once you've fixed the conflicts, let's run !bench.

Could you also prepare a PR to the lean-pr-testing-5346 branch of Batteries which adapts to these changes.

…-bounds

lyphyser · 2024-09-20T13:32:13Z

Fixed the conflicts.

All the instances are for locally defined typeclasses, so they shouldn't break anything I think.

I think what is breaking Mathlib Sqrt is this new simp lemma:
@[simp] theorem add_div_two_lt_right: (n + m) / 2 < m ↔ n < m:= by

And I think the reason is that the "(k + n / k) / 2 < k" in the if at line 33 is no longer in simp normal norm with the new lemma, so if the lemma is applied before the hypothesis, the hypothesis is not applied.

So I guess one can either remove the @[simp] or in Mathlib add a "try simp at h" before the simpa or change the if to n / k < k

lyphyser · 2024-09-21T17:04:06Z

Made leanprover-community/batteries#951 to adapt Batteries to this change.

Note that while this pull request currently renames "lt" to "f" (because it can now be a <= instead of a < and "lt" seems misleading), there are other possible approaches, such as:

Leaving it at "lt" for compatibility even if a <= now works as well
Calling it "r" but keeping it as an A -> A -> Bool rather the A -> A -> Prop that is customary for relations
Calling it "r" and changing it to A -> A -> Prop with a DecidableRel instance

Also, it may be reasonable to swap the order of the arguments so the comparison function is first and the array second (the advantage of this is that it's better from a currying perspective and it's still possible to have the function last by using dot notation, the disadvantage is that it breaks compatibility and the current order might be preferable sometimes)

kim-em · 2024-09-23T04:39:38Z

Calling it "r" but keeping it as an A -> A -> Bool rather the A -> A -> Prop that is customary for relations

Let's do this for now. We actively thinking about the design of comparators, but it will be easy for us to follow up if/when we decide to change things.

kim-em · 2024-09-23T04:39:58Z

CI is still breaking. I'd like to run !bench when it is ready.

kim-em · 2024-09-24T01:05:33Z

Once CI succeeds here, you'll need to rebase onto origin/nightly-with-mathlib again, so it's possible to fix Batteries+Mathlib.

lyphyser · 2024-09-30T22:03:00Z

Unfortunately I'm unable to give this much more attention, so this pull request is probably going to either have to be postponed or have someone else champion and merge it. It may also be possible to just merge the algorithm and not the proof.

Overall I think it's in a reasonable state, with the remaining possible problems being relative lack of generalization, performance, and naming:

Some index predicates proofs are slow to compile because they use automated brute force case splitting; a faster approach is possible I think but requires generalization
The index predicates could be generalized to work on arbitrary index sets instead of specific interval kinds, which could also replace the variable tuples with just one index
Several names probably would be better changed to better adhere to conventions

Also, it may be better to generalize it from just arrays to array-like things, and thus figure out some way to unify this with lists, both in proof and algorithm.

There's also the problem that QuickSort alone is not really an appropriate sort algorithm for a general purpose language library (although it's a great building block for one), because it has O(n^2) worst case scenario, so a more sophisticated approach is normally undertaken - see for example the recent Rust merges of driftsort and ipnsort; a more minimal solution would be to either use a guaranteed median/quantile algorithm to choose the pivot so it's guaranteed to be in the middle K% and/or switch to mergesort or heapsort when some call depth threshold is reached with the threshold chosen so that it's guaranteed to be O(n log n). Writing a verified implementation of those would probably be a significant undertaking.

lyphyser added 2 commits September 14, 2024 18:54

Change qsort to use proven in-bounds accesses

9c8dfb9

fix whitespace

5e5aff7

lyphyser requested a review from kim-em as a code owner September 14, 2024 17:18

lyphyser added 3 commits September 14, 2024 20:40

replace by_cases with dsimp only + split

aaa369d

import tactics

9b91779

correct imports so that it builds in Init/

3c06e7f

github-actions bot added the toolchain-available A toolchain is available for this PR, at leanprover/lean4-pr-releases:pr-release-NNNN label Sep 14, 2024

lyphyser added 4 commits September 15, 2024 00:09

minor change qsort API to provide bounds as hypotheses

b518d33

Only breaks users if they specify bounds explicitly and omega can't solve the goals automatically.

inline qpartition into qsort

3ebba57

This seems to produce the best C code, since it avoids both temporary allocations and code duplication of specialized functions

add comment, simplify the code

ec4f463

lyphyser added 2 commits September 15, 2024 01:37

move swap into else branch, since it's a no-op in the if because i ==…

8b554c7

… high

rename and rearrange code

bd990c5

add initial proof that qsort sorts (both ordering and permutation pro…

5319c45

…perties) Complete, except for the fact that antisymmetry and transitivity should only be required for elements in the array range, not any value of the type. Also needs tidying and refactoring.

lyphyser changed the title ~~Qsort with proven bounds~~ Qsort with proven bounds and correctness proof Sep 16, 2024

leanprover-community-mathlib4-bot added a commit to leanprover-community/batteries that referenced this pull request Sep 16, 2024

Update lean-toolchain for testing leanprover/lean4#5346

a6823be

leanprover-community-mathlib4-bot added a commit to leanprover-community/mathlib4 that referenced this pull request Sep 16, 2024

Update lean-toolchain for testing leanprover/lean4#5346

0dbed93

leanprover-community-bot added the builds-mathlib CI has verified that Mathlib builds against this PR label Sep 16, 2024

lyphyser added 4 commits September 16, 2024 10:56

add size_dite

54b3f48

State the nat lemmas as iffs with clean proofs

2e5b27b

rename theorems

00dc080

rename nat lemmas to follow mathlib naming convention

52101f7

leanprover-community-mathlib4-bot added a commit to leanprover-community/batteries that referenced this pull request Sep 16, 2024

Trigger CI for leanprover/lean4#5346

4a23cd3

leanprover-community-mathlib4-bot added a commit to leanprover-community/mathlib4 that referenced this pull request Sep 16, 2024

Trigger CI for leanprover/lean4#5346

16d95c6

leanprover-community-bot added breaks-mathlib This is not necessarily a blocker for merging: but there needs to be a plan and removed builds-mathlib CI has verified that Mathlib builds against this PR labels Sep 16, 2024

lyphyser requested a review from digama0 as a code owner September 18, 2024 09:42

Add doc comment for qsort

bc17d4c

lyphyser force-pushed the qsort-with-proven-bounds branch from 9801f91 to bc17d4c Compare September 18, 2024 10:02

leanprover-community-mathlib4-bot added a commit to leanprover-community/batteries that referenced this pull request Sep 18, 2024

Trigger CI for leanprover/lean4#5346

20a7f69

leanprover-community-mathlib4-bot added a commit to leanprover-community/mathlib4 that referenced this pull request Sep 18, 2024

Trigger CI for leanprover/lean4#5346

1dcc395

lyphyser added 6 commits September 18, 2024 10:24

weaken hypotheses on Completion

eeec6f1

add proof that ISortOf of TransGen is equal to ISortOf of f in < and …

d4257e1

…<= special cases This guarantees that our TransGen definition is not degenerate

use <-> for props instead of =

87917b8

style

6bc8d68

style

f4a2a24

rename eq to iff

e8582e4

leanprover-community-mathlib4-bot added a commit to leanprover-community/batteries that referenced this pull request Sep 18, 2024

Trigger CI for leanprover/lean4#5346

164e52c

leanprover-community-mathlib4-bot added a commit to leanprover-community/mathlib4 that referenced this pull request Sep 18, 2024

Trigger CI for leanprover/lean4#5346

4fc3afa

lyphyser added 3 commits September 18, 2024 15:15

rename

8496bbb

more eq to iff

ba46fb7

more ISortOf results

4dc2182

leanprover-community-mathlib4-bot added a commit to leanprover-community/batteries that referenced this pull request Sep 18, 2024

Trigger CI for leanprover/lean4#5346

f977043

leanprover-community-mathlib4-bot added a commit to leanprover-community/mathlib4 that referenced this pull request Sep 18, 2024

Trigger CI for leanprover/lean4#5346

a41019c

Merge remote-tracking branch 'upstream/master' into qsort-with-proven…

cbcbb3d

…-bounds

kim-em added the awaiting-author Waiting for PR author to address issues label Sep 24, 2024

github-actions bot added the stale label Oct 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: qsort with proven bounds and correctness proof #5346

feat: qsort with proven bounds and correctness proof #5346

lyphyser commented Sep 14, 2024

leanprover-community-mathlib4-bot commented Sep 14, 2024 •

edited

Loading

lyphyser commented Sep 15, 2024 •

edited

Loading

digama0 commented Sep 15, 2024

lyphyser commented Sep 16, 2024

lyphyser commented Sep 18, 2024

kim-em commented Sep 20, 2024

kim-em commented Sep 20, 2024

lyphyser commented Sep 20, 2024 •

edited

Loading

lyphyser commented Sep 21, 2024 •

edited

Loading

kim-em commented Sep 23, 2024

kim-em commented Sep 23, 2024

kim-em commented Sep 24, 2024

lyphyser commented Sep 30, 2024

feat: qsort with proven bounds and correctness proof #5346

Are you sure you want to change the base?

feat: qsort with proven bounds and correctness proof #5346

Conversation

lyphyser commented Sep 14, 2024

leanprover-community-mathlib4-bot commented Sep 14, 2024 • edited Loading

lyphyser commented Sep 15, 2024 • edited Loading

digama0 commented Sep 15, 2024

lyphyser commented Sep 16, 2024

lyphyser commented Sep 18, 2024

kim-em commented Sep 20, 2024

kim-em commented Sep 20, 2024

lyphyser commented Sep 20, 2024 • edited Loading

lyphyser commented Sep 21, 2024 • edited Loading

kim-em commented Sep 23, 2024

kim-em commented Sep 23, 2024

kim-em commented Sep 24, 2024

lyphyser commented Sep 30, 2024

leanprover-community-mathlib4-bot commented Sep 14, 2024 •

edited

Loading

lyphyser commented Sep 15, 2024 •

edited

Loading

lyphyser commented Sep 20, 2024 •

edited

Loading

lyphyser commented Sep 21, 2024 •

edited

Loading