Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RDF/JS and WHATWG stream interfaces alignment #6

Open
matthieubosquet opened this issue Apr 10, 2021 · 11 comments
Open

RDF/JS and WHATWG stream interfaces alignment #6

matthieubosquet opened this issue Apr 10, 2021 · 11 comments

Comments

@matthieubosquet
Copy link
Contributor

matthieubosquet commented Apr 10, 2021

Are there plans to make the RDF/JS: Stream interface closer to the WHATWG Readable Stream interface?

I'd be quite keen for example to be able to use asynchronous iteration.

@tpluscode
Copy link
Contributor

Hey Matthieu

[...] closer to the WHATWG Readable Stream

Not an authoritative answer, but "not likely" ™️ with the stream spec based in the node streams

I'd be quite keen for example to be able to use asynchronous iteration.

That said, async iteration is totally possible with the readable-stream already. So while you will need webpack to bundle as that will not work natively in a browser you can consume a stream in an async for loop

const fetch = require('@rdfjs/fetch')
const { quadToNTriples } = require('@rdfjs/to-ntriples')

async function main() {
  const res = await fetch('http://zazuko.github.io/tbbt-ld/data/person/sheldon-cooper.ttl')
  const stream = await res.quadStream()

  for await (const quad of stream) {
    console.log(quadToNTriples(quad))
  }
}

Not perfect but it allows you to build your APIs ignoring actual streams and only coding against AsyncIterable<Quad>

@matthieubosquet
Copy link
Contributor Author

Hi Tomasz,

Thanks for your swift answer!

My problem is that I'm writing a TypeScript library and programming to the interface, therefore, the for await (const quad of stream) syntax not being reflected in the types is problematic.

If the readable quad stream is iterable, could we ammend the interface to reflect it?

types/stream.d.ts

Lines 22 to 30 in e021fe6

export interface Stream<Q extends BaseQuad = Quad> extends EventEmitter {
/**
* This method pulls a quad out of the internal buffer and returns it.
* If there is no quad available, then it will return null.
*
* @return A quad from the internal buffer, or null if none is available.
*/
read(): Q | null;
}

Maybe to something like:

export interface Stream<Q extends BaseQuad = Quad> extends EventEmitter {
    /**
     * This method pulls a quad out of the internal buffer and returns it.
     * If there is no quad available, then it will return null.
     *
     * @return A quad from the internal buffer, or null if none is available.
     */
    read(): Q | null;

    [Symbol.asyncIterator](): AsyncIterableIterator<Q>;
}

I appreciate that this should probably be reflected in the RDF/JS stream spec too, isn't it?

@tpluscode
Copy link
Contributor

You do have a point I could agree with introducing such a change.

However, I think the RDF/JS spec should be by definition compatible with v2 of the stream implementation which does not support async iteration.

Until, and if, we make this change you can do type augmentation as I did in one of my projects

import type Quad from 'rdf-js'

declare module 'rdf-js' {
  interface Stream extends AsyncIterable<Quad> {}
}

@RubenVerborgh
Copy link
Member

Note that RDF/JS are low-level interfaces, from which you can build higher-level ones.

Back in the old days when we started RDF/JS, anything to do with promises was slower than callbacks.
So for me, the first step would be to come up with performance tests.

@tpluscode
Copy link
Contributor

How is performance relevant exactly?

RDF/JS is also a spec and not an implementation, right? The v3 of streams can be consumed as async iterator. So we can reflect that in the RDF/JS.

IMO the decision would be based on whether or not we want to keep support for v2 of node stream.

@RubenVerborgh
Copy link
Member

How is performance relevant exactly?

The entire API design of RDF/JS is based on what interfaces can be implemented at high performance.

RDF/JS is also a spec and not an implementation, right?

Interfaces put an upper bound on achievable performance.
5 years ago, anything with promises was a magnitude slower.

@tpluscode
Copy link
Contributor

I still don't see your point. Node stream already implements async iteration since v11 (and experimental in v10). All RDF/JS Stream does is constrain the chunk to be Quad instances.

A parser, being a producer of quads will use the underlying stream specification (thus already supporting Symbol.asyncIterator). The difference is only to the consumer, which can choose of the two ways for consuming.

What am I missing?

@RubenVerborgh
Copy link
Member

All RDF/JS Stream does is constrain the chunk to be Quad instances.

That is not correct. RDF/JS Stream is a subset of Node.js stream; the subset that was the fastest at the point of writing.

@blake-regalia
Copy link
Contributor

Using object streams at all is a performance hit. If you profile in node you will see just how much overhead they really cost. Better performance would be to go purely with callbacks or async iterable, but then the ability to attach multiple listeners would become user's responsibility.

Other than that I don't think there are any features provided by object streams that can't be reasonably replaced with callbacks or async iterable.

@RubenVerborgh
Copy link
Member

Using object streams at all is a performance hit.

Note that the minimum interface we demand is not Node.js streams (which have flow control), but rather event emitters, which are callback-driven.

@elf-pavlik
Copy link
Member

We have considered it in rdfjs/stream-spec#12 That thread includes comments from nodejs core contributor and readable-stream maintainer.

Could we have BaseStream (low level) and Stream with AsyncIterator? Maybe Stream and HiStream not to break current definitions. Implementations provide Steram with AsyncIterator and it would be helpful to have common type for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants