-
Notifications
You must be signed in to change notification settings - Fork 531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Polyglot Support #45
Comments
I'm currently working on integrating Java 8's Nashorn Javascript engine with Reactor and this idea of extending Reactive Streams into other languages is very much on my mind. There's the possibility of providing Reactive Streams--the API--in Javascript for use in server-side and client-side code to get the same characteristics as defined in the spec, and there's the cross-network-boundary possibility of communicating from one Reactive Stream to another over a network. As long as the interaction is clearly defined, I don't think the transport and protocol (or combination of the two) really matters. e.g. one could conceivably use plain HTTP + Javascript on the client side (sending JSON) and the Reactive Streams application on the server side would simply invoke |
After a great meeting today at Netflix with @tmontgomery I am more convinced that we should expand this out into several projects. I propose we focus this main To start with I think we need at least these:
We could also (if we're ambitious) include reference implementations of the network protocol, at least in Java and Javascript. Or we leave that to external projects to implement according to the network protocol. Todd and I are ready to start working on defining the websockets protocol compliant with reactive-streams and Netflix is ready to start implementing on top of it. Do we have agreement on making reactive-streams polyglot and include the network protocol? If so, can I proceed to create the new repos and migrate the JVM interfaces into |
Enthusiastic +1 from me. What can I do to help? |
+1 Sent from my iPhone
|
We probably only need 1 protocol spec that can run over any reliable unicast protocol (TCP or WebSocket or something else). I really like the direction this work is heading! |
@rkuhn Do you agree with and support splitting into multiple sub-projects under the "github.com/reactive-streams" umbrella? |
+1 ! |
Exactly.... communicating across an async [binary] boundary. To that end, the types of transports that the protocol would need to support are, at least: TCP, WebSocket, and IPC (most likely in the form of a shared memory SPSC queue). Adaptation to a multicast media (or SPMC or MPMC queue) probably should be considered, but might need to be treated differently. With |
Yes, enthusiastic +1 from me as well! I also agree with the proposed split into multiple sub-projects. The only part I am not sure I understand correctly is the “network protocols” part: there are different network protocols available today, some of which already have all desired semantics (plus some extraneous ones, like TCP) and some of which need dedicated protocol descriptions for supporting Reactive Streams (like UDP, raw Ethernet frames, etc.), and in addition people may choose to implement a stream transport on a completely new medium as well. Therefore in my opinion we can leave out the network protocol description from the reactive-streams project and add concrete ones like reactive-streams-udp as we get around to specifying them. The most important part then is to agree on the semantics—call it an abstract protocol if you will—and fix that first. Apropos: I take it that we agree on #46, especially given this angle on the scope of the project? If so, please comment on that issue as well. |
TCP semantics, and by extension WebSocket, don't have some semantics that are needed. And as @rkuhn points out, has some additional unneeded and undesired semantics. Specifically, TCP and WS have no concept of channels. So, while a single stream can be supported, multiple concurrent streams (with or without head-of-line blocking) can't be supported without an additional framing layer. However, protocols like SPDY and HTTP2 have a nice framing mechanism that could be leveraged directly. Additionally, Such a framing and control layer would also be needed by any other transport (UDP, etc.) as well. |
@tmontgomery I see, we were talking about slightly different things: what I meant was dedicating one TCP connection to the transfer of one stream, in which case no extra capabilities are needed beyond serialization and deserialization of the data items, using the event-based socket operations to convey data and demand. The difference between TCP and a Publisher/Subscriber pair is that that would act like a buffer:
where the buffer size is given by output + input buffers plus network components and in-flight bits. As far as I can see (please correct me if I’m wrong) this still has the behavior of dynamic push & pull, i.e. it switches between the two according to which side is currently faster (with the man in the middle capping how fast the receiver can appear to be). Are there other artifacts or bad effects that I am missing? If we want to transfer multiple streams over a shared medium (like one WebSocket or one TCP connection or one Unix pipe) then we will of course need some multiplexing and scheduling capability, including dedicated return channels for the demand signaling. I am not sure whether that should be our first goal, though, since this can be implemented on a primitive (single) stream by merging and splitting streams in the “application layer”. If OTOH the transport mechanism already comes with channel support, then we should of course try to use that. One potential issue I see with just pushing data and demand across an HTTP2 WebSocket is that multiple streams will have to be balanced in a fair fashion somehow, which implies that we must specify a scheduler or the protocol stack needs to already come with one—is that already the case? (please excuse my ignorance, I have yet to read up on this topic) |
@rkuhn Even with a single stream over a TCP connection, I believe you will need to consider having a control protocol for the semantics as I have read so far. You are correct in that TCP has flow control (all flow control must be push and pull), but those semantics in TCP have subtleties. That is perhaps the bit that is being missed. In TCP, controlling flow control only by receiving data rate is a very coarse grained tool. And has some quirks.... one is the interaction of Nagle and Delayed ACKs, for example. You are absolutely correct about multiple streams and scheduling. It becomes a scheduling problem immediately, but it is worse than that, actually. Here are a couple links that you might find interesting on the subject. http://sites.inka.de/~W1011/devel/tcp-tcp.html Some of the complexity has creeped into HTTP2, which is unfortunate, as the protocol has no other option than to expose the controls to the application. And most applications won't have any clue how to handle the complexity. However, I see that as a tremendous opportunity for reactive streams to bring value. It's a simpler mechanism bound to the application. It's in a perfect position to make this easier, much easier. And accessible for applications. And multiple streams per connection should be considered the norm, IMO. A single stream only per TCP connection is going to be very limiting. A proliferation of TCP connections is not a good thing. In fact, one of the main motivations for device battery life, for example, is very few TCP connections to keep device out of the high energy state as much as possible. HTTP2 and SPDY multiplex requests and responses and use multiple streams for many reasons. One of which is to reduce TCP connection counts for browsers and servers. With that in mind, standardizing how to mux multiple streams onto any transport is a good thing, I think. |
Thanks a lot for this explanation, it entirely makes sense. Great to have you on board! |
Agreed, otherwise this doesn't work for us in what we're pursuing with both TCP and WebSockets. |
It seems we have consensus to migrate this initiative to being polyglot, so shall we move forward with making sub-projects and moving things around? Shall we start with these?
@tmontgomery Do we need separate projects for TCP and websockets, or can they be together? If together, under what name? |
I’d say we can start moving things around once we have fixed the discrepancies between code and documentation (i.e. #41 and possibly follow-up fixes are merged). |
Sounds good! |
@rkuhn Shall we proceed now that we've agreed in #46 and merged the contract definition? Do you want to release 0.4 first from the current project structure and then proceed with the steps I outlined above in #45 (comment) ? |
Splitting things up should not require any changes in interfaces or semantics, we are just moving things around (including that the generic README is stripped from JVM-specific provisions which move into the JVM subproject), so I do not see any obstacles towards doing it in parallel. The released artifacts are as far as I can see also not affected in any way by the split. |
Okay, so shall I proceed with submitting a PR to propose the changes? Do you agree with the layout I suggested? |
I don't like the idea of specifying a protocol here for many reasons. one of them that it feels completely out of scope.
The reactive streams semantics will allow a recipient to send a huge request if the potential set of answers fits its memory, but that does not translate to the kernel buffer size which will eventually has to deal with the incoming binary data from the wire. The semantics do not map one-to-one, so I think this is misguided. You will inevitably need a local bridge Subscriber that communicates properly with whatever underlying kernel driver it must talk to and gives proper request counts. |
@drewhk Actually, what you mention makes the case for there being a control protocol. Relying on TCP semantics here alone is not enough because of the possibility of overrunning the subscriber side if the obvious solution were to be used. In the obvious solution, there must be double buffering outside the kernel Whether a bridge subscriber is used as a solution is open for debate. It's hardly the inevitable solution, IMO. In most cases, having event units that are in the multi GB range, means a range of tradeoffs to be made. Adding another point to buffer multiple fragments into a single unit (like a bridge subscriber would) is a less than ideal solution. I would handle that by only reassembling at the end subscriber site. Where it has to be... unless the system decides that smaller units are the way to go anyway. |
I am not sure what you mean here. Just because a remote Subscriber requests for 1 000 000 elements that does not mean that the underlying transport should also request 1 000 000 elements. It might even -- ad absurdum -- request elements one by one and have a buffer size of exactly 1 element and eventually stil serves the 1 000 000 requests. There is no unboundedness here. Another example similar to this, just because you have a chain of purely asynchronous map stages:
And the downstream subscriber for the last map requests 1 000 000 elements, that does not mean that the map stages between each other will also issue a request for 1 000 000. That would mean that each step is forced to have a buffer that can hold 1 000 000 elements worst case. On the other hand they can have a buffer size of 1, 128, or even different buffer sizes in-between, and issuing requests in even smaller batches (say, bufSize / 2 for example). |
Further on in my comment, I actually mention large objects (i.e. large elements). Which is what I thought you meant. Overrun in that case is because of a single large object. And a request of 1. That has overrun possibilities unless handled well. i.e. if you request 1GB in a single element. Which users will do. The underlying implementation can optimize down to a single element for a pipeline. In fact, it should. So, the 1M elements you mention can be handled a number of ways fine. However, below a single element, the only possibility is fragmentation. Which needs to be reassembled. Which works best with a framing protocol and definition of MTU. Without that, you are left with very few options how to handle it efficiently. |
The discussion in #47 may be relevant here. The Reactive Streams interface allows requesting n 'elements', but if the element type is a byte array and not a single byte, there is no way to request a number of bytes. It's impossible to "request 1GB in a single element"; if you request a single element of type byte array, the publisher can give a single byte array of any size it wishes and still be within the spec. An implementation can introduce controls for the byte array size, but if the on-the-wire protocol is some standardized form of Reactive Streams, it won't be able to communicate this to the other side. In #47 I said this made 'byte streams' between different implementations over the network impractical. The response was that the implementation talking to the network on each end should document its buffer size (the size of the byte arrays it produces), and specific publishers should also document the chunk sizes they produce, and then consumers can rely on those two things. We'll see in practice if this is good enough. What is the element type of your Reactive Stream? If it's byte[], then there is no way to signal the size of the byte[] you want. If it's some object, but that object type's size can vary greatly and it can be split and merged (eg HTTP message chunks), the same problem exists. The type can't be If on the other hand the element type can't be split and merged, then you have no choice but to request some whole amount of elements. If a single element can be 1GB in size, but you can only process whole elements, then you don't have a choice but to buffer it. If you don't want to buffer it, write a streaming processor and change your element type to a smaller fixed-size frame. |
Is it safe to say Reactive Streams are currently meant work on a single machine / single JVM? My confusion is based on my understanding of Akka supporting remote actors (in a cluster). Specifically, can subscribers be distributed across a cluster? |
Nope its not meant to work a single machine, its a contract over how a resource A passes bounded demand to B. Naturally the contract fits in Java threading but it is more than that. If you write a driver for a database or a message broker, or a microservice client, you will need to deal with some issues RS takes care of:
Beyond that the contract specifies additional control over when to start and stop consuming (Subscription protocol). E.g., a database driver will close is connection if a Subscriber invoked Subscription.cancel. Now all these classic software components (the drivers etc) can implement the same contract to pass these signals around, this is where the win occurs for all of us implementors and users. Because we have a single contracts, database drivers, message broker drivers, reactive systems including reactive extensions, clients, servers, they can all be bound in a single processing chain (calling subscribe in chain) to compose a fully asynchronous pipeline with flow-control. Obviously adding IO protocol to carry these signals would be a plus to the spec as we have to implement this for every new transport, but a good chunk of the flow can end up being truly reactive right now. Hope it clarifies a bit ! |
Thank you for the quick response. So the "current" status of Reactive Streams does allow passing messages to remote actors on remote servers (I want to make sure we are talking about the current implementation vs the discussion in this thread to create new protocols.) To be specific, here is a proposed scenario using the "current" version of RS: A REST front end, which is a Play Application, receives with several JSON posts (lots of streaming docs or batches of docs) being sent to it from many remote sources. Realizing this could also be an AKKA HTTP endpoint, but in our case, we use Play everywhere upstream. Logical Outline might be: Then it calls an actor (which may or may not be on the same machine) which is the Publisher of the stream 1 subscriber (actually 2 or more for redundancy) might simply drop the JSON in an S3 bucket on amazon, another subscriber (again, actually 2 or more for redundancy) on more servers might do some ETL / Data cleaning, and drop some useful summary data in Cassandra, and so on, and so on.. I am scoping out some ideas on when a simple load balanced front end (a Play app that does everything, but several load balanced with some supporting app tier micro services) might be enough, vs building out a Play -> akka cluster vs building out Play -> Akka Streaming Cluster. Getting my head around a specific use case. |
@jeffsteinmetz Hi Jeff, some of your questions sound like they may be more suited for the Play or Akka User MLs. The goal behind this Issue is to provide, as @smaldini says: "[…] adding IO protocol to carry these signals would be a plus to the spec as we have to implement this for every new transport, but a good chunk of the flow can end up being truly reactive right now." Defining the "how" to transport Reactive Streams signals across the network would allow for seamless interoperability between different platforms (think JS, Python, JVM, .NET) over different networked transports. I hope that clarifies it a bit :) |
Makes sense. I figured the question was a bit of a broad topic to be posting within an "issue" on github. p.s. |
Up until this afternoon, I was under the impression this org/repository was solely to support standardization of reactive streams on the JVM. Consequently, we created this organization this afternoon (https://github.com/Reactive-Streams-Open-Standard/reactive-streams-spec) which is JavaScript specific. Speaking with @benjchristensen this afternoon, I found out about this polyglot effort and I was wondering if you all would like to role the JavaScript effort in under your organization. The focus of the JavaScript effort right now is as follows:
In the future, it would also be great to have the JavaScript effort collaborate in reactive networking standards, (being communication over HTTP/WebSockets/SSE/etc). |
I would love to see a JavaScript effort here, myself. And I still intend to get to some network protocols around this effort. |
@Blesh I definitely would like to see this happen as part of reactive-streams.org, and thanks @tmontgomery for jumping in to provide support of this. @reactive-streams/contributors I propose the following and would like your feedback and/or +1 on moving forward:
|
I'm happy to help! This is easily the most exciting set of developments in programming right now, IMO. reactive-streams/reactive-streams-js is preferable to me*, but other than that 👍 * It might help with searching GitHub or tabbing through directories in terminal too... (e.g. "reactive-streams-j", TAB ...) |
@Blesh I'm fine with @viktorklang in particular I'd like your input on this. |
@Blesh Developing the JavaScript standard and TCK under this organization is definitely a good idea, we’d love to extend the family to more than just JVM languages. Your suggested repository name sounds good to me, it matches how things are called otherwise :-) @benjchristensen I support your proposal of renaming and integrating repositories, maybe with the slight alteration of calling the JVM dialect reactive-streams-jvm (since it is meant to be used from all JVM languages). Should we leave the artifact names as they are, based on the reasoning that Maven central is only used for JAR-files anyway? Concerning the website we will have to update for 1.0 anyway (ASAP, where the ‘P’ was the problem so far), so folding the polyglot aspects into that rewrite should be easy and it adds one more reason to not procrastinate ;-) |
I'm good with that name.
Yes, I think it's okay to leave the name. <dependency>
<groupId>org.reactivestreams</groupId>
<artifactId>reactive-streams</artifactId>
<version>1.0.0</version>
</dependency> I'm not aware of any other type of artifacts outside the JVM community that end up on Maven Central, so I don't think we need to change the artifactId to
Sounds good :-) @reactive-streams/contributors any other comments on this before we move forward with creating the new @viktorklang or @rkuhn I don't have privileges to create new repos, so one of you will have to do it. |
@benjchristensen I added you to the Owners, created the reactive-streams-io and reactive-streams-js repositories and renamed reactive-streams-jvm. @Blesh has been added to the js-admin team so that he should be able to set everything in motion on the JavaScript side of things. I hope I didn’t overlook anything, it is getting a little late here ;-) |
👍 thank you, @rkuhn and @benjchristensen for helping us put this together. |
@rkuhn: Just checked, I don't see any admin features on that repo... I think maybe I'm missing something. |
@rkuhn: Nevermind, I just got the org invite. Took a while to show up in my email I guess. |
Thanks @rkuhn for the quick action on this! |
Looking forward to having a look when I get back from vacation :-) Cheers,
|
Very interesting proposal, but also very ambitious. Defining a standard messaging protocol for asynchronously moving data with back pressure.Platform/ language agnostic. Scores of language specific clients will then spring up. It could even be embedded in routers. Sounds very much like what AMQP is/was supposed to be. I've never been a big fan of AMQP (too complex imho), but any attempt to solve the problem in a simpler/better way gets my attention :) |
Nice, we started some work around that in reactor-net. Will look when I am back from vacation too :) Sent from my iPhone
|
I have opened an issue at reactive-streams/reactive-streams-io#1 to start this up. Please tell me if I'm embarrassing myself and should stop, or help me move forward with it if it makes sense. I am somewhat out of my league in this area so am really seeking help and expertise, so forgive what I'm sure is naive and elementary attempt at describing what I'm seeking. |
@purplefox Thanks for getting involved. I would love to have your guidance and involvement on this as it is honestly more aspirational for me than my core skill set. @tmontgomery Todd, I rely heavily on your experience, skills and expressed interest as we've talked in the past to make this happen! |
@benjchristensen I'm in! Will definitely help. |
Sorry for the late reply, I am currently on vacation with limited Cheers,
|
@reactive-streams/contributors Closing this issue since these efforts will be continued as |
This issue might be worth reopening as I think there's one more working group that would be helpful to polyglot components. As Reactive Streams offers "seamless interoperability between different platforms (think JS, Python, JVM, .NET) over different networked transports", there ought to be an intelligent discovery system for components. This WG would define the metadata used across app subprojects such that components can be found in a unified manner. Two schemes I've seen are active registries that require being pinged with a trackback URL to a publicly accessible repo (http://bower.io/search) or passive registration such as Maven Central. Since different platforms are unlikely to ever converge on a single repository over time, it seems that an active registry is a better way to go. The ability to easily discover components could eventually segue into IDE support for graphical pipelines, so it may be good to think in that direction as well. |
Hi @briantopping, the topics are very interesting, but seem to supersede |
Hi @viktorklang, it seemed to be the case to open another repo, but I could have been missing something. I'm happy to start a discussion there to gauge interest and see what could be generated in the way of requirements if so. |
I suggest expanding this initiative beyond the JVM since most of us need our data streams and systems to interact over network boundaries with other languages.
Thus, it seems it's actually more important to define the protocol and contract and then allow each language platform to define the interfaces that meet it.
Perhaps an approach to this is breaking out into multiple sub projects such as:
Even if the focus in the short-term remains on the JVM interface design, we would gain a lot by including communities such as Javascript/Node.js, Erlang, .Net, banking and financial trading (who have been doing high performance messaging for decades). It would also make the model far more useful as we could then consume a reactive stream from Javascript in a browser via WebSockets to powered by Netty or Node.js receiving data from Rx/Akka/Reactor/whatever and it would "just work".
The text was updated successfully, but these errors were encountered: