Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: let DependencyProvider prioritize dependencies #50

Merged
merged 1 commit into from
Nov 16, 2020
Merged

Conversation

Eh2406
Copy link
Member

@Eh2406 Eh2406 commented Oct 23, 2020

This follows up on #40.

The first commit explores pick_package approach as described in #40 (comment).
The second commit explores the make_decision approach.
The third commit uses a BTreeMap to remove Hash req on OfflineDependencyProvider

Beadback is definitely appreciated.

Edit: somewhat accidentally closes #18

Copy link
Member

@mpizenberg mpizenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally like the complete control given back to the dependency provider with make_decision. The usage of the BTree is also a good idea to remove the hash constraint. The trick with Borrow to enforce a correct package is also really nice.

There are few oddities though in package docs. Names that do not make a lot of sense anymore like the "reverse" terminology for the dependency provider. And a few other things that I found weird at first read like the error when the version is in the term (should be the opposite?).

src/solver.rs Outdated Show resolved Hide resolved
src/solver.rs Outdated Show resolved Hide resolved
src/solver.rs Show resolved Hide resolved
@Eh2406
Copy link
Member Author

Eh2406 commented Oct 24, 2020

I force pushed to fix the commit names. I also addressed @mpizenberg's comments. I have 2 remaining concerns:

  1. The module documentation still refers to list_available_versions. I could use some help describing the new method.
  2. I added a new trait Constraints to hide Term is this a good idea? Alternatively we can make Term public with only the one method available, or as we are filtering to positive terms we can convert to a Range which is already public.

@Eh2406 Eh2406 force-pushed the priorities branch 2 times, most recently from 8357220 to 8ae3d66 Compare October 24, 2020 16:06
@mpizenberg
Copy link
Member

Alternatively we can make Term public with only the one method available, or as we are filtering to positive terms we can convert to a Range which is already public

Good point! Since potential packages are only picked if there is at least one positive derivation, the intersection is necessarily a positive term. Using a Range in the public argument type would make more sense from the point of view of the dependency provider!

@Eh2406
Copy link
Member Author

Eh2406 commented Oct 25, 2020

switched to Range. I (at this time of night) could not get the lifetimes to work for &Range so used Borrow<Range<V>>

src/solver.rs Outdated Show resolved Hide resolved
Copy link
Member

@mpizenberg mpizenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, with maybe some changes to some doc comments. Waiting on others reviews

src/solver.rs Outdated Show resolved Hide resolved
@Eh2406
Copy link
Member Author

Eh2406 commented Oct 26, 2020

with maybe some changes to some doc comments

What did you have in mind? Also:

The module documentation still refers to list_available_versions. I could use some help describing the new method.

@aleksator
Copy link
Member

Hopefully I'll be able to look at PRs tomorrow or the day after.

fn make_decision<T: Borrow<P>, U: Borrow<Range<V>>>(
&self,
packages: impl Iterator<Item = (T, U)>,
) -> Result<(T, Option<V>), Box<dyn Error>>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose to provide a default implementation for this method with the heuristic we were using previously.

This way we still give control to the users who want it, but provide the option to leave the way we pick packages as the library suggests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to keep the list_available_versions? Because without that I don't know how to build a default implementation. It was removed as we otherwise don't need it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should keep list_available_versions in the interface. Maybe we could have a free function (or collection of free functions) that could help. Something like pick_first_strategy, pick_lowest_number_of_versions_strategy etc. The names are terrible, but that's just for the sake of the example

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was sceptical, but gave it a try. And it worked out really well! Take a look and let me know what you think.

Copy link
Member

@aleksator aleksator Nov 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Eh2406

Do you want to keep the list_available_versions?

@mpizenberg

I don't think we should keep list_available_versions in the interface.

Why not? What if our interface is 3 functions:

  1. make_decision()
  2. list_available_versions()
  3. get_dependencies()

The first one has default implementation in terms on list_available_versions, no need to change it right away for the algorithm to work but the ability to customize it is there.

The idea is somewhat similar to what iterators to with the next() function. You define it and get some others for free. Here we would implement 1) in terms of 2).

The only not so pretty thing about that is if the user comes up with a way to redefine make_decision without using list_available_versions, but the API still requires it. In that case they could leave it unimplemented!. Is it an API smell? Hmmm.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think "avoid the breaking change" justifies that smell given our user base

My motivation is not avoiding breaking changes, but rather the easiness of API usage.

Before this PR it was crystal clear what the algorithm needed from the user to function (from the user's perspective):

  1. A way to get dependencies for a given package version.
  2. A way to get all versions for the package.

All makes sense why it is needed and how to implement it.

Now:

  1. Unchanged.
  2. Hm, actually you need to implement part of our algorithm to use our library.

How would people do that? What if they don't want to understand how exactly we do the decision making?

The only practical solution is to copy paste the code from OfflineDependencyProvider.
I'm afraid that requiring every user to do that is too much of an ergonomic hit.

Does anyone else share my concerns?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get your point about ease of use. However, I think people not wanting to use OfflineDependencyProvider and willing to define their own dependency provider are willing to read the implementation of an 11-lines function doing nothing more than a function call to a helper function. Also there is an even simpler 6-lines example implementation in the guide on the section regarding custom implementation of a dependency provider.

If we get more users, and some of them point make_decision as being difficult to implement, I think I'd rather have a two different traits with different ease/control compromises (a bit like the sandbox/element/document/application variants of an elm app) than having a single trait with an ambiguous API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we get more users, and some of them point make_decision as being difficult to implement, I think I'd rather have a two different traits with different ease/control compromises (a bit like the sandbox/element/document/application variants of an elm app) than having a single trait with an ambiguous API.

Actually that's what my girlfriend suggested as well! 😆 I was pretty impressed because she's sophomore (2nd year in the uni) and is doing that as a way to change her profession, meaning she has very little experience in programming yet.

In any case, doing that is out of scope of this PR and needs a separate discussion; I'm totally okay merging this version!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm glad we had this discussion as this is an interesting design problem I haven't had to think of before, thank you @Eh2406 and @mpizenberg.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you have a wise girlfriend @aleksator ahah, keep her! XD

src/solver.rs Outdated Show resolved Hide resolved
tests/proptest.rs Outdated Show resolved Hide resolved
@aleksator
Copy link
Member

This does simplify the code, but the performance characteristics are different. In the first case, we only need to browse terms for any positive before computing intersection. In the refactor we always compute the intersection. If benchmarking showcases similar performances, it's worth it. Otherwise let's leave it for another round of performance adjustments.

On the other hand, we don't iterate over not intersected derivations every time.

I run the benchmarks a few times before submitting the change.
Before the change it has completed in 1.65ms on my machine, after the change in 1.67ms. I'm not sure if this is spurious or not, since when I benchmarked something else before I was getting much larger variations without any code changes.

@Eh2406
Copy link
Member Author

Eh2406 commented Oct 28, 2020

When I ran the benchmarks from #34, criterion was fairly sure there is a small regression comparing ecaeb39 with acc391e. It is made smaller if we eagerly do the intersections and go ahead and do away with derivations_not_intersected_yet entirely. But criterion thinks it is still a 3-5% regression.
It is a code complexity vs perf trade off, I'd be ok either way.

@aleksator
Copy link
Member

When I ran the benchmarks from #34, criterion was fairly sure there is a small regression comparing ecaeb39 with acc391e. It is made smaller if we eagerly do the intersections and go ahead and do away with derivations_not_intersected_yet entirely. But criterion thinks it is still a 3-5% regression.
It is a code complexity vs perf trade off, I'd be ok either way.

I've restored "if conditions" and still simplified the code a little with an early return.

src/lib.rs Outdated Show resolved Hide resolved
tests/proptest.rs Outdated Show resolved Hide resolved
@Eh2406 Eh2406 force-pushed the priorities branch 2 times, most recently from e852586 to eb3d094 Compare November 12, 2020 21:48
@mpizenberg
Copy link
Member

The last commit makes the code cleaner by using min_by but I think it doubles the amount of calls to list_available_versions so don't hesitate to revert that one. It's weird that there is not function like min_with that would take a F: Fn(&Self::Item) -> impl Ord or similar.

@mpizenberg mpizenberg force-pushed the priorities branch 2 times, most recently from fbf8ff4 to 47c055f Compare November 13, 2020 22:59
@Eh2406
Copy link
Member Author

Eh2406 commented Nov 13, 2020

Dose min_by_key work here?

@mpizenberg
Copy link
Member

Just after that comment I realized the function exists XD and is called min_by_key.

@aleksator
Copy link
Member

I've reviewed the PR so far, great work introducing the ability to customize the lib's behavior.

The one thing I wanted to discuss about the API change is a default implementation for make_decision. We can leave that discussion for a potential separate PR if you'd like.

@aleksator aleksator force-pushed the priorities branch 2 times, most recently from 2944c58 to 15a3bcd Compare November 14, 2020 20:25
tests/proptest.rs Show resolved Hide resolved
tests/proptest.rs Show resolved Hide resolved
struct OldestVersionsDependencyProvider<P: Package, V: Version>(OfflineDependencyProvider<P, V>);

impl<P: Package, V: Version> DependencyProvider<P, V> for OldestVersionsDependencyProvider<P, V> {
fn make_decision<T: std::borrow::Borrow<P>, U: std::borrow::Borrow<Range<V>>>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about naming this [pick/choose]_next_package?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could have been possible. make_decision fits well with PubGrub terminology since this step is called "decision making"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true. The question is who do we optimize the easiness to understand for:

  1. library authors
  2. users

I think make_decision optimizes for us, while naming function based on what it does optimizes for users.

What if we use descriptive function name from the user perspective and put a link to "decision making" in the function docs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I don't know, tough naming call. If you prefer explicitness from the user point of view, maybe choose_package_version_within is the most descriptive name? I don't mind if you go for it, but can you also update the guide then please (PR #45).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with choose_package_version; how does it look now?

I haven't yet read the guide as I'm focusing on PR reviews first. Will update once I get there 🙂

@Eh2406
Copy link
Member Author

Eh2406 commented Nov 15, 2020

Using the benchmark of solving all versions in elm-packages.ron, criterion thinks this PR makes a 19% improvement from 2.2sec -> 1.8sec.
Solving for all versions in the checked in large_case.ron criterion thinks this PR makes a 23% improvement.

@aleksator
Copy link
Member

@Eh2406 That's a lot! Must be usage of iterators instead of allocating new vectors for list_available_versions that was done before?

@Eh2406 @mpizenberg
No blockers from me anymore, let's merge this if you agree with the naming changes in the latest commit discussed in #50 (comment).
Feel free to resolve and merge.

There is also an interesting discussion on future possible API changes, where @mpizenberg thought of 2 separate traits for customizability/ease of use: #50 (comment)
This is in no way a blocker for this PR, just something we can keep in mind and possibly implement in the future if we desire to do so. I thought it's pretty clever 😄

src/solver.rs Outdated Show resolved Hide resolved
@Eh2406
Copy link
Member Author

Eh2406 commented Nov 15, 2020

Must be usage of iterators instead of allocating new vectors for list_available_versions that was done before?

I think so. Allocating a BTreeSet (in versions) and Vec (in list_available_versions) just to get a count for all available packages for each decision adds up.

@mpizenberg is better at naming than I, so I will leave final approval to him.

@aleksator aleksator changed the title let dependency provider priorities packages feat!: let DependencyProvider prioritize dependencies Nov 15, 2020
Copy link
Member

@mpizenberg mpizenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright congrats all on finalizing that PR!

@mpizenberg
Copy link
Member

@Eh2406 There is a conflict and I'm in the middle of some uncommitted things from the old dev branch for my analysis of elm packages. Could you do the merge? Don't hesitate to squash everything before rebase since that means only 1 conflictual commit.

@Eh2406
Copy link
Member Author

Eh2406 commented Nov 16, 2020

Did a squash and a rebase. I think nothing was lost in that, but just in case I will leave it for one of you to merge.

@aleksator
Copy link
Member

There was a conflict because of my merged PR; fixed.

@aleksator aleksator merged commit 3d8fd9d into dev Nov 16, 2020
@aleksator aleksator deleted the priorities branch November 16, 2020 11:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants