-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#221 Add percentileAll and medianAll #385
Conversation
@@ -941,8 +941,118 @@ else if (percentile == 1.0) | |||
} | |||
|
|||
/** | |||
* Get a {@link Collector} that calculates the common prefix of a set of strings. | |||
* CS304 Issue link: https://github.com/jOOQ/jOOL/issues/221 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May I ask what CS304 means? I'll remove it again, but I'm curious. Is this some CS=Computer Science university class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should apologize for these comments. We add these PRs because it is our software engineering course homework to make contributions on some open project(╯﹏╰)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries, just curious. I'll remove the comments. Another question, what made you choose jOOλ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's mainly because this project is easy to learn, and it seems to be an important project because it's popular. Actually we have also considered JOOQ, but it's too big.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Yes, it's much harder to make meaningful contributions to jOOQ - especially because the tests are not open source.
return Seq.seq(result); | ||
} | ||
); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you checked, is there any refactoring potential to have common logic between percentileBy()
and percentileAllBy()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, at least the collection to the accumulator seems to be the same, the difference being the finisher implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like the similar structure of percentileBy
and percentileAllBy
too, but I didn't find a proper way to refactor these codes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, no worries. I'll refactor it. Thanks for the effort!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should do the trick:
private static <T, U, R> Collector<T, List<Tuple2<T, U>>, R> percentileCollector(
Function<? super T, ? extends U> function,
Comparator<? super U> comparator,
Supplier<R> onEmpty,
Function<T, R> onSingle,
Function<List<Tuple2<T, U>>, R> finisher
) {
return Collector.of(
() -> new ArrayList<>(),
(l, v) -> l.add(tuple(v, function.apply(v))),
(l1, l2) -> {
l1.addAll(l2);
return l1;
},
l -> {
int size = l.size();
if (size == 0)
return onEmpty.get();
else if (size == 1)
return onSingle.apply(l.get(0).v1);
l.sort(Comparator.comparing(t -> t.v2, comparator));
return finisher.apply(l);
}
);
}
Leading to e.g.
public static <T, U> Collector<T, ?, Optional<T>> percentileBy(double percentile, Function<? super T, ? extends U> function, Comparator<? super U> comparator) {
if (percentile < 0.0 || percentile > 1.0)
throw new IllegalArgumentException("Percentile must be between 0.0 and 1.0");
if (percentile == 0.0)
return minBy(function, comparator);
else if (percentile == 1.0)
return collectingAndThen(maxAllBy(function, comparator), s -> s.findLast());
else
return percentileCollector(
function,
comparator,
Optional::empty,
Optional::of,
l -> Optional.of(l.get(percentileIndex(percentile, l.size())).v1)
);
}
Where
private static int percentileIndex(double percentile, int size) {
// x.5 should be rounded down
return (int) -Math.round(-(size * percentile + 0.5)) - 1;
}
return maxAllBy(function, comparator); | ||
|
||
return Collector.of( | ||
() -> new ArrayList<Tuple2<? extends T, U>>(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this covariance needed here?
|
||
List<T> result = new ArrayList<>(); | ||
U value = l.get(right).v2; | ||
// Search the first value equal to the value at the position of PERCENTILE by binary search |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice thinking!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps use Collections.binarySearch()
instead, though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know this API before. It's obviously better to use Java standard library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'll refactor it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The JDK method finds a random index if multiple values have the same index, so we can't use that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, Collections.binarySearch
will not return the first element equal to the target item. Part of the source codes of Collections.binarySearch
attached.
while (low <= high) {
int mid = (low + high) >>> 1;
Comparable<? super T> midVal = get(i, mid);
int cmp = midVal.compareTo(key);
if (cmp < 0)
low = mid + 1;
else if (cmp > 0)
high = mid - 1;
else
return mid; // key found
}
Item reverse() { | ||
return new Item(-this.val); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll merge this, but I'll probably get rid of this class. I'm not sure what value it provides compared to just using Integer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nevermind. I see the point now. I'll just move the local class out of the method for better reuse.
Thanks again for your contribution. Note that if you plan on making more contributions, it will be sufficient to provide an implementation for the |
|
||
for (; left < size && comparator.compare(l.get(left).v2, value) == 0; left++) | ||
result.add(l.get(left).v1); | ||
return Seq.seq(result); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The auxiliary result
list is not necessary. We could use l
again here and add the comparator predicate as a Seq.filter()
. Not sure though if we want to keep a reference to the possibly large array list, though... Edge cases...
Additional changes here: d41bd54 |
And here: 1bdd69b |
PS: Feel free to add yourself as |
#221
Unlike
percentileBy
andmedianBy
, the functionpercentileAllBy
andmedianAllBy
will retrieve all the values of the specific percentile.The order of the output elements is the same as the order of those elements in the input
Seq
.I add a lot of test cases for the two functions, and all the previous and new tests are passed.
Though there is already unsubmitted commits related to issue #221 in May 2020, it seems that there were some bugs with his codes.