Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add async iterable<T> type to WebIDL #1397

Open
wants to merge 22 commits into
base: main
Choose a base branch
from

Conversation

lucacasonato
Copy link
Member

@lucacasonato lucacasonato commented Mar 25, 2024

This commit lifts the async iterator processing logic from the
ReadableStreamFromIterable operation in the Streams spec, to a new WebIDL type
async iterable<T>.

This will clean up the streams spec, and enable a change in the Fetch spec to
directly allow taking async iterable as request/response bodies, without users
having to pass them to ReadableStream.from first.

(See WHATWG Working Mode: Changes for more details.)


Preview | Diff

This commit lifts the async iterator processing logic from the `ReadableStreamFromIterable` operation in the Streams spec, to a new WebIDL type `async iterable<T>`.

This will clean up the streams spec, and enable a change in the Fetch spec to directly allow taking async iterable as request/response bodies, without having to pass them to `ReadableStream.from` first.
@saschanaz
Copy link
Member

Has there been discussion somewhere publicly? 👀

@lucacasonato
Copy link
Member Author

@saschanaz No, not of any specifics. The background here is this issue: whatwg/fetch#1291 (comment). @annevk mentioned we'd need to figure out how to make this work IDL wise. This is my attempt at that (it enables updating the BodyInit overload in Fetch from:

typedef (Blob or BufferSource or FormData or URLSearchParams or USVString) XMLHttpRequestBodyInit;
typedef (ReadableStream or XMLHttpRequestBodyInit) BodyInit;

to

typedef (Blob or BufferSource or FormData or URLSearchParams or USVString) XMLHttpRequestBodyInit;
typedef (ReadableStream or async iterable<BufferSource> or XMLHttpRequestBodyInit) BodyInit;

@saschanaz
Copy link
Member

Hmm, I see. BTW please include the full PR template: https://github.com/whatwg/webidl/blob/main/PULL_REQUEST_TEMPLATE.md

@annevk
Copy link
Member

annevk commented Mar 25, 2024

I wonder how this ends up working in practice. Say you have Blob or async iterable<T> or DOMString. If you pass in a Blob with @@asyncIterable support, how is it handled? I guess @@asyncIterable comes later, probably before toString?

cc @petervanderbeken @natechapin

@petervanderbeken
Copy link

petervanderbeken commented Mar 25, 2024

What is being proposed exactly, to what WebIDL type would it be converted? A reference to an Iterator Record?

@lucacasonato
Copy link
Member Author

lucacasonato commented Mar 25, 2024

I wonder how this ends up working in practice. Say you have Blob or async iterable or DOMString. If you pass in a Blob with @@asyncIterable support, how is it handled? I guess @@asyncIterable comes later, probably before toString?

@annevk I have specified this behaviour in an update to the overload resolution algorithm in this PR already. The gist is: an async iterable<T> has the same behaviour for distinct types as sequence<T> (which one could think of as the eager synchronous version of async iterator<T>).

I have updated the graphic in the spec that explains this too:
image

In practice:

  1. async iterator<T> can be distinguished from all types but object and sequence<T>
  2. As with sequence<T>, a matching interface is considered more important than a platform object that implements the iterable, or async iterable protocol. So specifically for the BodyInit with Blob case: nothing changes.
  3. As with sequence<T>, primitives like strings, even if they have @@iterator on their prototype, are not considered for async iterator<T> overloads, and are also not valid async iterator<T> values. async iterator<T> requires values are Type(V) == Object, in addition to having either @@iterator or @@asyncIterator.

What is being proposed exactly, to what WebIDL type would it be converted? A reference to an Iterator Record?

@petervanderbeken I am proposing the addition of an async iterable<T> type, which mirrors sequence<T> in many ways, but unlike it is:

  • lazy: it can be possibly infinite length, with the JS -> IDL conversion not requiring synchronous transformation of all values of the underlying iterable. Instead values are lazily iterated off the underlying iterator, converted from JS to IDL, and yielded to IDL callers, as they request them.
  • asynchronous: values can be produced asynchronously

If you imagine a matrix of JS stream primitives, with asynchronicity on one axis, and eagerness on the other, this fills out the square opposite of sequence<T>.

sync async
eager JS: Array<T>; IDL: sequence<T> JS: Promise<Array<T | Promise<T>>>; async sequence<T> (doesn't exist)
lazy JS: { @@iterator: Iterator<T> }; IDL: iterable<T> (doesn't exist) JS: { @@asyncIterator: AsyncIterator<T> }; IDL: async iterable<T> (this PR)

The use case for this type are Web APIs that take "streams" of values (of unknown length), that they then process in some way. A real Web API that does this right now is ReadableStream.from(). It takes objects that are iterables as a Web IDL any type, and performs manual processing of this iterable. For one API, this is fine, but there are other proposed APIs that would also take async iterables as arguments:

Introduction of this async iterable<T> simplifies all of these, because the entirety of the iterable iteration logic is now pulled into Web IDL, where all callers can share it.

@saschanaz
Copy link
Member

What is being proposed exactly, to what WebIDL type would it be converted? A reference to an Iterator Record?

Per https://whatpr.org/webidl/1397/402a886...14f1bdb.html#js-async-iterable the result indeed seem to be iterator record. (Adding because I don't think the comment above actually answers this.)

@annevk
Copy link
Member

annevk commented Mar 26, 2024

This design sounds good to me. It seems the Build is currently failing though. Could you write a corresponding PR for Streams to show it in action so we have some assurance it can be adopted?

cc @MattiasBuelens

Copy link
Contributor

@MattiasBuelens MattiasBuelens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking pretty good!

index.bs Outdated Show resolved Hide resolved
index.bs Outdated Show resolved Hide resolved
@MattiasBuelens
Copy link
Contributor

MattiasBuelens commented Mar 26, 2024

3. As with sequence<T>, primitives like strings, even if they have @@iterator on their prototype, are not considered for async iterator<T> overloads, and are also not valid async iterator<T> values. async iterator<T> requires values are Type(V) == Object, in addition to having either @@iterator or @@asyncIterator.

This would technically be a breaking change for ReadableStream.from(), since we currently do accept strings (see this test).

That said, I wrote that test more for the sake of completeness, and I don't expect much actual usage in the wild. We can consider making ReadableStream.from() throw on strings, or we could change its signature to accept async iterable<any> or string instead.

@lucacasonato
Copy link
Member Author

lucacasonato commented Mar 26, 2024

@MattiasBuelens I was not aware of this behaviour in Streams. I think we should stick with the behaviour where async iterator<T> must be an object in this PR (mirroring sequence<T>). Overload resolution becomes tricky and rather confusing otherwise.

Let's discuss whether to accept or error on DOMString arguments in the ReadableStream.from() PR.

@lucacasonato
Copy link
Member Author

@annevk I had already opened a draft PR to the streams spec, showing the integration yesterday. Here is a link: whatwg/streams#1310.

@lucacasonato
Copy link
Member Author

Do we have Web IDL tests in WPT? I don't think so, right? I guess no tests specifically for this are relevant.

Considering this is mostly an editorial refactor, do we even need "implementer interest"? There is nothing to implement until an upstream spec uses this. I'll ask for support anyway though:

  • @annevk You already expressed support on this design, can I take that as interest from WebKit?
  • @saschanaz What are your thoughts? Does Mozilla have an opinion?
  • For Chrome, maybe @domenic could chime in?
  • Deno is interested in this.
  • Node.js likely is too, considering that this unlocks new Response(asyncIterable). @matteocollina can you comment?

Also @jasnell, any thoughts?

@lucacasonato
Copy link
Member Author

Got some feedback from @saschanaz elsewhere that the constraint that async iterable<T> is only valid in JS->IDL positions (rather than also in IDL->JS positions) should be more prominently highlighted. I'll address this tomorrow

@domenic
Copy link
Member

domenic commented Mar 27, 2024

Do we have Web IDL tests in WPT? I don't think so, right?

We do, but they're weird since they test things via APIs that use those features. https://github.com/web-platform-tests/wpt/tree/master/webidl/ecmascript-binding

I guess no tests specifically for this are relevant.

I think linking to the tests for streams using this feature would be good. Especially if you can verify that everything you introduce in this PR is covered by some aspect of the streams tests. (Which is probably the case since the streams tests are quite thorough, IIRC.)

Considering this is mostly an editorial refactor, do we even need "implementer interest"? There is nothing to implement until an upstream spec uses this. I'll ask for support anyway though:

Implementer support for Web IDL stuff like this is tricky and requires a judgement call. I think we should CC the bindings teams and give them a chance to weigh in, so @japhet and @yuki3 for Chromium. For example if they were concerned about how to implement this in the bindings layer, or if they were concerned that introducing this feature would be bad because then too many specs might use it and actually it should be rare.

However, since this doesn't require any immediate changes to any implementations, I think we shouldn't block on requiring such support, if we don't hear back.

@domenic
Copy link
Member

domenic commented Mar 27, 2024

Regarding strings, I think we should follow the behavior of async iterator helpers's AsyncIterator.from(); /cc @bakkot. I cannot tell what that behavior is, however. Iterator.from() definitely accepts strings. But AynscIterator.from() calls GetIteratorFlattenable with a second parameter whose value is async, which is neither accept-strings nor reject-strings.

@bakkot
Copy link
Contributor

bakkot commented Mar 27, 2024

AsyncIterator.from will accept strings because Iterator.from does. But do note that it's specifically a coercion method. For something like ReadableStream.from, which is also a coercion method, you may wish to follow that precedent, but I'm not sure you'd want to do so for other async-iterable-taking things.

The way AsyncIterator.from works is to look up Symbol.asyncIterator, and if that fails then fall back to Symbol.iterator with the result wrapped the same way for await does (modulo some quibbles not relevant here), and if that fails to use the .next method directly (which only iterator-taking things should do, which is to say, nothing except AsyncIterator.from and AsyncIterator.prototype.flatMap).

And since it will not specifically reject strings, and strings have a Symbol.iterator, it will accept them, the same way for await does.

(Sorry the spec for AsyncIterator.from is out of date; I'm working through other parts of the semantics and it doesn't really make sense to get the spec updated until those are sorted. But this question, at least, is settled.)

@domenic
Copy link
Member

domenic commented Mar 27, 2024

Thanks for the quick response. I'm unsure whether we should think of new Response(asyncIterable) as a "coercion method", but I lean toward yes... and those are our two use cases so far, so, I'm unsure where that leaves us for the general Web IDL type.

@bakkot
Copy link
Contributor

bakkot commented Mar 27, 2024

Personally I would specify this by making async iterable<T> reject strings, and then adding or USVString to those places which should accept strings, just to ensure that future users of the type have actually decided to take strings on purpose instead of it happening by accident.

Plus if you take a string explicitly then you can use it directly instead of having to look up and invoke String.prototype[Symbol.iterator], which is going to be... slower.

@lucacasonato
Copy link
Member Author

Thanks for the comments @domenic and @bakkot. I agree strongly with @bakkot that in the Web IDL type, we should only allow objects implementing the async iterable protocol for the async iterable<T> type. Anything else would be a significant hassle with a high chance for unexpected behaviour when plumbed into the overload / union resolution algorithm. Also there is already precedent in Web IDL that iterable taking types do not support iterating string (sequence<T> does not support iterating strings, but does iterate objects implementing the iterable protocol).

For the streams specific discussion, lets move it right here: whatwg/streams#1310 (comment)

@lucacasonato
Copy link
Member Author

Folks, could you take another look? I have clarified the allowed positions for async iterables (namely not in attributes), and have added guidance that:
a) async iterable IDL values can not be constructed in IDL, and always must come from language bindings
b) that spec authors should generally return an interface with an async iterable declaration, not an async iterable IDL value

@annevk has confirmed implementer interest for WebKit on Matrix - is anyone else in favor, or opposed?

Copy link
Member

@domenic domenic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! I think the theoretically-language-agnostic part is not quite separated nicely from the JS language part, and we can make the latter more rigorous. But the feedback is largely editorial.

index.bs Outdated
<h4 id="idl-async-iterable-type" lt="async iterable" dfn export>Async iterable types — async iterable&lt;|T|&gt;</h4>

The <dfn lt="async iterable type" export>async iterable&lt;|T|&gt;</dfn> type is a parameterized
type whose values are references to objects that can produce an [=async iterator=].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem to match how other parameterizable types are defined. I'd expect just "async iterable" as the dfn text.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That dfn is on the heading. I copied this from sequence. It has two dfn:

  • sequence in the heading
  • sequence type in the body

index.bs Outdated Show resolved Hide resolved
index.bs Outdated Show resolved Hide resolved
index.bs Outdated
Async iterators, unlike sequences, do not have a fixed length and can be infinite. Values are
asynchronously produced as the async iterator is iterated over.

Async iterators are values, not types, and thus cannot be used as a type in IDL. They can only be
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems pretty confusing. What type to the created values have, then?

I think this whole section is JS language binding specific, since it relates to algorithms like "open" and "get the next value" which are only defined for the JS language binding. So maybe we can remove this confusing not-a-type from the Web IDL "Types" section (2.13), and make it something produced and consumed by those algorithms?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved it down to the JS language section, and added some other wording up here to express that IDL async iterables are themselves not being iterated over, but produce a value that is being iterated over (without referencing [=async iterator=]).

index.bs Outdated Show resolved Hide resolved
index.bs Outdated Show resolved Hide resolved
index.bs Outdated Show resolved Hide resolved
index.bs Outdated Show resolved Hide resolved
index.bs Outdated Show resolved Hide resolved
index.bs Outdated Show resolved Hide resolved
index.bs Outdated Show resolved Hide resolved
index.bs Outdated Show resolved Hide resolved
Copy link
Member

@domenic domenic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really nice now! Just a few more editorial tidbits. Also be sure to fill out the template in the OP.

index.bs Outdated Show resolved Hide resolved
index.bs Outdated Show resolved Hide resolved
index.bs Outdated Show resolved Hide resolved
index.bs Show resolved Hide resolved
index.bs Outdated Show resolved Hide resolved
index.bs Outdated Show resolved Hide resolved
index.bs Outdated Show resolved Hide resolved
@lucacasonato
Copy link
Member Author

@domenic What tests should I write for this? I don't think we have specific webidl tests in WPT, right? Only tests for features that use the webidl types.

Copy link
Member

@domenic domenic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What tests should I write for this? I don't think we have specific webidl tests in WPT, right? Only tests for features that use the webidl types.

This was discussed in #1397 (comment).

index.bs Outdated Show resolved Hide resolved
index.bs Outdated Show resolved Hide resolved
index.bs Outdated Show resolved Hide resolved
Copy link
Contributor

@MattiasBuelens MattiasBuelens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (aside from one last nit)

index.bs Outdated Show resolved Hide resolved
@annevk
Copy link
Member

annevk commented Aug 21, 2024

Just to make sure I understand correctly, this is blocked on tests and this is best tested through the Streams change whatwg/streams#1310 which is currently still in draft. So once the Streams change is ready and has test coverage (which should include coverage for the IDL bits) this is all good to go?

@lucacasonato
Copy link
Member Author

So during an experiment we ran for new Response(async iterable<Uint8Array>), we discovered another fun edge case to think about: when having a union of string | async iterable<Uint8Array>, what should a boxed string resolve as? In the current spec text it resolves as a async iterable<DOMString>, which would cause an error during iteration. This happens because the boxed string is both an object, and has a [Symbol.iterator] method.

We have two options here:

  1. handle this case specially in WebIDL by banning boxed string as a value for async iterable, like we do with primitives (kinda weird, but means this doesn't become a weird edge case only in fetch and is instead uniform across the platform)
  2. handle this as a weird edge case in fetch, and in the future in any other string | async iterator WebIDL unions.

The big question is whether, assuming ReadableStream.from("asd") throws, ReadableStream.from(new String("asd")) should throw, or create a stream of string chunks.

Any thoughts?

@lucacasonato
Copy link
Member Author

Copying over a thought from @annevk:

Treating string objects and values in the same way whenever possible seems preferable. Which argues for 1 as otherwise each API would have to deal with this anew.

@bakkot
Copy link
Contributor

bakkot commented Aug 26, 2024

In TC39 we went the other direction, and said that String objects are just another kind of object. Having the check conceptually be "is a primitive" is the simplest mental model.

If the user provides a primitive wrapper Object such as a String Object, however, it should be treated like any other Object.

@annevk
Copy link
Member

annevk commented Aug 26, 2024

That also seems reasonable, but the problem is that any API currently accepting a string would need to juggle a lot of complexity to safely adopt async iterable<T>.

Perhaps there's a third solution whereby string | async iterable<T> cases involving a String object are resolved by going down the string path? But if there's no string in the type union we'd treat a String object as an Object for the purposes of async iterable<T>. (Of course, this has the opposite problem for when you start out with async iterable<T> and later overload with string, but that seems far less likely.)

@lucacasonato
Copy link
Member Author

I like the option where we just handle this in the union.

an [=async iterator=]. The [=async iterator=] can be asynchronously iterated over to produce values.

<dfn lt="async iterator" export>Async iterators</dfn> are [=structs=] with the following [=struct/items=]:
* <dfn for="JS async iterator">underlying record</dfn>, an [=Iterator=] record
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ECMAScript routinely refers to these as the type "Iterator Record". The term is often linkable as "Iterator Record" too, so could we maybe follow suit and make this [=Iterator Record=] instead?


1. Let |nextResult| be
<a abstract-op>IteratorNext</a>(|iterator|'s [=JS async iterator/underlying record=]).
1. If |nextResult| is an abrupt completion, return [=a promise rejected with=]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All references of "abrupt completion" in the current standard link to the ES term. Could we do that too here?


To <dfn id="async-iterator-close" export lt="close an async iterator">close</dfn> an
<code><a lt="async iterator">async iterator&lt;<var ignore>T</var>&gt;</a></code> |iterator|,
with a reason |reason|:
Copy link
Member

@domfarolino domfarolino Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe clarify that |reason| is a JavaScript value (i.e., instead of a Web IDL value that would need to be converted), like we do in https://webidl.spec.whatwg.org/#a-promise-rejected-with?

@lucacasonato
Copy link
Member Author

We ran another experiment with changed behaviour where for unions of string | async iterable<T> the String object is resolved as string rather than as async iterable<T>. We have not gotten any regression reports so far. I am going to update this PR with that change.

bartlomieju pushed a commit to denoland/deno that referenced this pull request Nov 26, 2024
WebIDL `async iterable<T>` type rejects `string`

Ref whatwg/webidl#1397, #24623
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

9 participants