Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design feedback #19

Open
HadrienG2 opened this issue Mar 12, 2023 · 6 comments
Open

Design feedback #19

HadrienG2 opened this issue Mar 12, 2023 · 6 comments

Comments

@HadrienG2
Copy link

HadrienG2 commented Mar 12, 2023

You sollicited feedback on this crate in rust-lang/rust#80094, but that did not seem like the right place to give it (off-topic wrt the original issue, which might eventually be closed if we all agree that array::zip is not the ideal solution to our problems), so let me give it here.

One think I really like about the general approach of this crate in that it focuses on safe interfaces first, which should be what users most often interact with. And the proposed interfaces look reasonable from the perspective of people using the iterator.

However, one thing I'm less sure about is that the abstraction is based on a struct that is constructed from a dynamic-sized iterator, rather than a trait in its own right like the Iterator trait, of which the current IteratorFixed struct would become the array::IntoIter special case.

I think that with this design, you may lose opportunities for optimizations where implementors of a prospective IteratorFixed trait could make use of the fact that most methods consume the entire iteration stream in order to specialize their code for this situation. These optimizations would only be lost if the IntoIterator bridge to dynamic-sized iteration were called, but not in the typical case where people want to eventually collect the results into a fixed-size collection.

Which leads me to this other point that in the current approach, third-party implementations of FromIteratorFixed for collections other than arrays (e.g. perfect hash maps) have no choice but to work by converting the iterator to a dynamic-sized iterator and unsafely assuming that this iterator has N items. Perhaps there is indeed no other way, but making every fixed-sized collection implementation require unsafe code feels like an undesirable outcome.

TL;DR: Unsafe conversion from dynamic-sized iterator and safe conversion into a dynamic-sized iterator that the caller must make unsafe assertions about seems like an imperfect core API to build upon, and it would be great if we could somehow come up with something better.

On an unrelated note, I also get the impression that some of the methods of the original Iterator that are currently excluded might actually exist, just in a runtime-checked + unsafe-unchecked form where the user asserts that the collection filter will yield M elements. This would not make sense for specialized filters (e.g. adding such an assertion would transform take_while(), into take()), but I could see a use case for general filters like filter() in situations where the user can assert that there will be M outputs, but cannot tell where these outpus lie in the data stream.

@usbalbin
Copy link
Owner

Thank you for the feedback :)

TL;DR: Unsafe conversion from dynamic-sized iterator and safe conversion into a dynamic-sized iterator that the caller must make unsafe assertions about seems like an imperfect core API to build upon, and it would be great if we could somehow come up with something better.

Yes I fully agree. I have not managed to come up with some way to make possible without dynamic iterators under the hood.

I suppose the problem is that we need some abstract way of describing how to get exactly all N of the elements out of the collection. I also assume we prefer to do this one element at a time i.e. not going through an array due to the performance concerns from array_zip?

To me this sounds a lot like Iterator, however there is that exactly all N part...

Just a (not fully thought out) thought, what if we have something roughly like

type StaticIterator<const N: usize> {
    type Item;
    
    /// Get the next item from the iterator.
    /// 
    /// Calling this function consumes the iterator returning both the rest of the iterator and the next element
    fn next(self) -> (Skip<1, Self>, Self::Item);
}

/// We can not call next() when there are no more elements since that type does not implement StaticIterator
impl<const SKIP: usize, const N: usize> StaticIterator<N - SKIP> for Skip<SKIP, N> where N > SKIP {
    ...
}

Not sure how one would actually iterate over such an iterator though preferably avoiding recursion. There is also the potential problem of us having to move around the entire collection on every call to next. Perhaps an assoc type Tail could reduce some of that problem.

type StaticIterator<const N: usize> {
    type Item;

    /// A type which represents the rest of the iterator, default is `Skip<1, Self>`
    /// however this can be overriden. For example an array may want to have `Tail = [T; N - 1]`
    /// while something like a `Vec3` might have `Vec2` etc.
    type Tail = Skip<1, Self>;
    
    fn next(self) -> (Self::Tail, Self::Item);
}

However we would now probably have a problem with !Copy types forcing the use of something like MaybeInitialized in the impl.

Again, this is just a random thought.

@HadrienG2
Copy link
Author

HadrienG2 commented Mar 12, 2023

I have spent some time thinking about this "one element at a time" design before, and think it would not be workable with good API ergonomics and reasonable compile-time overhead. You bring some good additional points re. needing to move data around and recursion making API ergonomics even worse, so I think in the end we agree that there is no way around the basic interface being some kind of "pull all elements at once or panic trying" primitive.

In the same thread @the8472 discussed some possible approaches in this vein, on which I would have additional comments.

  • I do not fully understand option 1, an API/usage mockup might help here.
  • Option 2 seems pretty similar to what you have in this crate (dynamic Iterator with an additional unsafe contract), but perhaps with stronger optimization guarantees due to the underlying UncheckedIterator API being lower level (often, we can coerce trusted Iterators into optimizing well via unwrap_unchecked, but I can totally imagine situations where this would not work out).
  • Option 3 reads as "implement everything in one module/crate to leverage knowledge of implementation internals", which seems hard to make work with a design that's usable by the outside world. I would like to see an example of how an external crate (e.g. phf) could implement this variant of the FixedIterator concept.

@the8472
Copy link

the8472 commented Mar 12, 2023

I do not fully understand option 1, an API/usage mockup might help here.

Something I tried to implement for regular iterators here: https://github.com/rust-lang/rust/pull/93243/files#diff-596a93132dff3041d643dc1176a7eea92312455a4f1e8ed102e8cea8c4f35189R49-R61

HackMD writeup: https://hackmd.io/CN5eBvNdRySbQbPf-zp1Ww

(often, we can coerce trusted Iterators into optimizing well via unwrap_unchecked, but I can totally imagine situations where this would not work out).

Yeah, unwrap_unchecked doesn't always optimize perfectly.

Option 3 reads as "implement everything in one module/crate to leverage knowledge of implementation internals", which seems hard to make work with a design that's usable by the outside world.

I haven't worked out how to translate the java approach into rust types. But yes, interop with 3rd-party implementations would be awkward and less optimized because it'll either require some kind of rewrapping/translation between the types or a complete custom implementation of the trait and evaluator. And the latter thing would be difficult to do in a forward-compatible way. Java has more flexibility here with virtual dispatch, their default methods can simply return a different type than custom implementations do.

The advantage of such an approach would be that it allows more powerful optimizations, potentially at compile-time if the combinator-tree-building can be made const. That's maybe not worth it for simple things like map or skip. But if someone were to add bulk operations like sorting, SIMD, matrix operations etc. then it might become worthwhile to optimize the execution. well... .zip().map().collect() can already warrant a different approach than .zip().collect()

I would like to see an example of how an external crate (e.g. phf) could implement this variant of the FixedIterator concept.

I'm not sure though what phf factors into custom fixed iterators? Currently it's based on slices and not arrays. If it were array-based and didn't provide any custom iterator combinators it could just created a fixediter from arrays.

@HadrienG2
Copy link
Author

HadrienG2 commented Mar 12, 2023

Re. phf, I was trying to come up with an example of a container...

  • Whose size is known at compile time
  • Whose backing store is more complex than an array of values (e.g. for a compile time hash-map you would probably want to expose iteration over (key, value) pairs and building from an iterator of this sort as the ideal API)
  • Whose implementation is complex enough that it would not belong to std.

I think we need a motivating use case of this sort in order to have a good discussion of what the fixed-size iterator API should look like, because in my mind there are really two separate audiences of users we need care about:

  1. Everyday iterator end users who want to iterate over std and third-party collections and do stuff with the iterators (for loops, collecting into collections...).
  2. Crate authors who make collections (or other kinds of iterables?) and want to expose iterators over them or build them from iterators.

I think the current iter_fixed design would already address the needs of the first audience quite well, and it's the story for the second audience that needs a little more ironing out.

@the8472
Copy link

the8472 commented Mar 12, 2023

Whose backing store is more complex than an array of values (e.g. for a compile time hash-map you would probably want to expose iteration over (key, value) pairs and building from an iterator of this sort as the ideal API)

It already has a [(K, V)] internally. But let's assume it didn't, then the most naive impl would be taking the key array, turning it into an iterator and then mapping it via a lookup function that gets the values and then you'll have [(&K, &V); T] again. If it somehow doesn't have arrays at all but still a fixed size you could start with array::from_fn and then turn that into an iterator?

I was actually talking about a 3rd audience: Authors who want to write extension traits providing new combinators or final output methods for fixed iterators. I think that is the most difficult part, at least if one wants things to optimize well.

@the8472
Copy link

the8472 commented Mar 12, 2023

On an unrelated note, I also get the impression that some of the methods of the original Iterator that are currently excluded might actually exist, just in a runtime-checked + unsafe-unchecked form where the user asserts that the collection filter will yield M elements. This would not make sense for specialized filters (e.g. adding such an assertion would transform take_while(), into take()), but I could see a use case for general filters like filter() in situations where the user can assert that there will be M outputs, but cannot tell where these outpus lie in the data stream.

The output could also be padded to a fixed size. That's what some of the SIMD ops do, they fill lanes and have a fallback value for the rest or selectively take some values from input A and the rest from input B.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants