Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#221 Add percentileAll and medianAll #385

Merged
merged 2 commits into from
May 21, 2021
Merged

Conversation

JingHuaMan
Copy link
Contributor

@JingHuaMan JingHuaMan commented May 20, 2021

#221

Unlike percentileBy and medianBy, the function percentileAllBy and medianAllBy will retrieve all the values of the specific percentile.

The order of the output elements is the same as the order of those elements in the input Seq.

I add a lot of test cases for the two functions, and all the previous and new tests are passed.

Though there is already unsubmitted commits related to issue #221 in May 2020, it seems that there were some bugs with his codes.

@@ -941,8 +941,118 @@ else if (percentile == 1.0)
}

/**
* Get a {@link Collector} that calculates the common prefix of a set of strings.
* CS304 Issue link: https://github.com/jOOQ/jOOL/issues/221
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I ask what CS304 means? I'll remove it again, but I'm curious. Is this some CS=Computer Science university class?

Copy link
Contributor Author

@JingHuaMan JingHuaMan May 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should apologize for these comments. We add these PRs because it is our software engineering course homework to make contributions on some open project(╯﹏╰)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries, just curious. I'll remove the comments. Another question, what made you choose jOOλ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's mainly because this project is easy to learn, and it seems to be an important project because it's popular. Actually we have also considered JOOQ, but it's too big.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Yes, it's much harder to make meaningful contributions to jOOQ - especially because the tests are not open source.

return Seq.seq(result);
}
);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you checked, is there any refactoring potential to have common logic between percentileBy() and percentileAllBy()?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, at least the collection to the accumulator seems to be the same, the difference being the finisher implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the similar structure of percentileBy and percentileAllBy too, but I didn't find a proper way to refactor these codes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, no worries. I'll refactor it. Thanks for the effort!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should do the trick:

private static <T, U, R> Collector<T, List<Tuple2<T, U>>, R> percentileCollector(
    Function<? super T, ? extends U> function,
    Comparator<? super U> comparator,
    Supplier<R> onEmpty,
    Function<T, R> onSingle,
    Function<List<Tuple2<T, U>>, R> finisher
) {
    return Collector.of(
        () -> new ArrayList<>(),
        (l, v) -> l.add(tuple(v, function.apply(v))),
        (l1, l2) -> {
            l1.addAll(l2);
            return l1;
        },
        l -> {
            int size = l.size();

            if (size == 0)
                return onEmpty.get();
            else if (size == 1)
                return onSingle.apply(l.get(0).v1);

            l.sort(Comparator.comparing(t -> t.v2, comparator));
            return finisher.apply(l);
        }
    );
}

Leading to e.g.

public static <T, U> Collector<T, ?, Optional<T>> percentileBy(double percentile, Function<? super T, ? extends U> function, Comparator<? super U> comparator) {
    if (percentile < 0.0 || percentile > 1.0)
        throw new IllegalArgumentException("Percentile must be between 0.0 and 1.0");

    if (percentile == 0.0)
        return minBy(function, comparator);
    else if (percentile == 1.0)
        return collectingAndThen(maxAllBy(function, comparator), s -> s.findLast());
    else
        return percentileCollector(
            function,
            comparator,
            Optional::empty,
            Optional::of,
            l -> Optional.of(l.get(percentileIndex(percentile, l.size())).v1)
        );
}

Where

private static int percentileIndex(double percentile, int size) {

    // x.5 should be rounded down
    return (int) -Math.round(-(size * percentile + 0.5)) - 1;
}

return maxAllBy(function, comparator);

return Collector.of(
() -> new ArrayList<Tuple2<? extends T, U>>(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this covariance needed here?


List<T> result = new ArrayList<>();
U value = l.get(right).v2;
// Search the first value equal to the value at the position of PERCENTILE by binary search
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice thinking!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps use Collections.binarySearch() instead, though?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know this API before. It's obviously better to use Java standard library.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'll refactor it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JDK method finds a random index if multiple values have the same index, so we can't use that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Collections.binarySearch will not return the first element equal to the target item. Part of the source codes of Collections.binarySearch attached.

while (low <= high) {
      int mid = (low + high) >>> 1;
      Comparable<? super T> midVal = get(i, mid);
      int cmp = midVal.compareTo(key);

      if (cmp < 0)
          low = mid + 1;
      else if (cmp > 0)
            high = mid - 1;
      else
            return mid; // key found
}

Item reverse() {
return new Item(-this.val);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll merge this, but I'll probably get rid of this class. I'm not sure what value it provides compared to just using Integer

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind. I see the point now. I'll just move the local class out of the method for better reuse.

@lukaseder lukaseder merged commit 2a232d9 into jOOQ:main May 21, 2021
@lukaseder
Copy link
Member

Thanks again for your contribution. Note that if you plan on making more contributions, it will be sufficient to provide an implementation for the jool module only, the jool-java-8 module is created automatically using scripts that aren't on this github repository.


for (; left < size && comparator.compare(l.get(left).v2, value) == 0; left++)
result.add(l.get(left).v1);
return Seq.seq(result);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The auxiliary result list is not necessary. We could use l again here and add the comparator predicate as a Seq.filter(). Not sure though if we want to keep a reference to the possibly large array list, though... Edge cases...

@lukaseder
Copy link
Member

Additional changes here: d41bd54

@lukaseder
Copy link
Member

And here: 1bdd69b

@lukaseder
Copy link
Member

PS: Feel free to add yourself as @author to the Agg class

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants