-
Notifications
You must be signed in to change notification settings - Fork 382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Response-Streaming GWT RPC with backward compatibility; Fixes #7987 #9775 #9779
Conversation
…n forward order on client
…of client-side reader
…r/server/rpc/impl/ServerSerializationStreamReader.java
there are still many open TODOs, and the code is entirely untested yet
serializing tokens in forward order for versions >= SERIALIZATION_STREAM_FORWARD_STREAMING_VERSION and backward for earlier versions
new streaming format works with gwt.rpc.version >= 9
…ys using JavaScript concat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for my delay in posting something formal as a review, between the changing PR and the trickling in of commits I wasn't clear when you were ready. I've read about 80% of it, and like where it is going, I hope this review doesn't seem to be too critical. I'll try to be more prompt in the future, esp if you can signal me when you are sure you're ready for a deep look.
At a minimum to merge this, we'll have to have #9782 and its corresponding tools change landed. In the meantime, you could merge those locally and run the style/api checks.
--
I've been going over the intent of this, supporting streaming a gwt-rpc payload to make use cases where fully de/serializing a payload costs more in memory than you might expect, and I have two main conclusions. They might not be applicable for your own use case, but I think that if this work is worth doing, it is probably worth generalizing further.
First, given this rework, why not move strings out from a separate "string table" in the payload and into the primary payload, so they can go out as the main response is sense, and don't need to be batched up for later? No need to store an array of them, just keep them like normal objects in the identity map (which already has them anyway in at least some cases). In addition to simplifying the payload (why are longs sent as strings in the main payload, but strings go in the string table at the end?), it also has the side effect of letting the recipient stream-process the response, instead of waiting for the complete payload (not applicable with XHR, but non-browser clients or browsers that support Fetch could in theory read async and discard buffers as they come in).
Next, why just responses, why not requests as well? A response does tend to be the obvious first step (javax.servlet can easily stream responses but not requests, and the server tends to be many:one so its payloads seem more important than receiving), but async servlets are available, and as written a server with N large incoming calls would need to hold on to full payloads rather than read and discard as it goes? Unfortunately browser clients can't stream requests at this time even with fetch, but either when that planned improvement lands or through the use of websockets, this would be a benefit.
Making more changes in one batch seems like a better approach, so as to prevent downstream projects that update wire format intermittently from needing to change more than once. It wouldn't be necessary to actually implement the server stream reading at this time (and can't, until #9727 is resolved), but if we were already ready for those changes, a later patch could introduce them when we are prepared.
--
The care you've taken to make this backwards compatible is certainly appreciated, though I wonder if it might make sense to offer a separate set of classes and branch based on the version? Mostly an idle thought, and not a request for a change. Reading the apparent stream version earlier could make the encodeResponse api changes easier, by leaving them as is (and deprecated them), and introducing new methods that require a modern approach. Other changes tend to only affect a simple branch, but as yours is more broad, it could make sense to just extract an interface or base class and build from there?
--
Finally, unit tests - would it be possible to see about tweaking some of the tests to run on both the old versions and this new one? I'm surprised to see they don't already do this, running either individual test classes with some getWireVersion()
method overridden different in each implementation, or a loop that tests each expected version. At a minimum I was thinking an updated com.google.gwt.user.server.rpc.RPCTest
and com.google.gwt.user.client.rpc.impl.ClientSerializationStreamReaderTest
|
||
/** | ||
* The current RPC protocol version. | ||
*/ | ||
public static final int SERIALIZATION_STREAM_VERSION = 7; | ||
public static final int SERIALIZATION_STREAM_VERSION = 8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not advance this all the way to 9? That is, if both sides can use it, is there a reason why 8 should be the default instead (7 is for when a browser might need not-quite-json, 8 is for real-json).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the incompatibility of the changes and no up-front deprecation warning it seemed a little harsh to me to break adoptions relying on a non-null payload return value as a default. Unless things are being deprecated, I thought that staying with a version compatible to previous APIs would be the better choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the other hand, if the new format is superior, it should be the default, right? If we document it properly (note there isn't a "put release notes here" sort of place, but we just need to accumulate them as we continue with a release, on the issues and PRs themselves. Would be good to put a policy in place around this..), then downstream consumers can force 7 or 8 as desired to retain the old behavior.
@@ -629,23 +693,54 @@ private static int getRpcVersion() throws SerializationException { | |||
* @param wasThrown if true, the object being returned was an exception thrown | |||
* by the service method; if false, it was the result of the service | |||
* method's invocation | |||
* @return a string that encodes the response from a service method | |||
* @return a string that encodes the response from a service method, or {@code null} if a protocol version of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a private method - while it is helpful to update its javadoc, the public methods that now return null are even more in need of this note being added. Probably should include some release notes summary we can add, to be clear how this should be used in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that public methods are even more relevant for updating the Javadoc comments. I found a number of public methods, such as
public static String invokeAndEncodeResponse(Object target, Method serviceMethod, Object[] args,
SerializationPolicy serializationPolicy, int flags) throws SerializationException {
but they didn't have a Javadoc at all, so I didn't feel it would be appropriate to introduce one just for this change. Are you recommending to do so?
Where can I find and edit the release notes?
public static void writeResponse(ServletContext servletContext, | ||
HttpServletResponse response, String responseContent, boolean gzipResponse) | ||
public static Writer createWriterForResponse(ServletContext servletContext, | ||
HttpServletResponse response, boolean gzipResponse) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
javadoc is now incorrect, responseContent was removed.
That said, this is likely to register as a breaking API change still, so it might make more sense to restore the old version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting you'd like to preserve writeResponse although it is not called anymore, along the lines of what you wrote about splitting the code into two "branches" where the branch for protocol version 9 wouldn't keep any compatibility?
We can keep writeResponse in addition to the new createWriterForResponse, but it would be dead code in the framwork and at least would require deprecation. I'm not sure, either, how any potential use of writeResponse by some framework or specialization would blend into the modified approach to how the writer is created. Any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears that anything in a .server.
package is considered "not checked as being part of the API", so the default config won't flag this change... Also it appears I failed to push the actual new config file, so I'll correct that, and maybe discuss widening the "public" api, at least as far as automation is concerned.
@@ -280,7 +283,9 @@ public final SerializationPolicy getSerializationPolicy(String moduleBaseURL, | |||
* @param payload the UTF-8 request payload | |||
* @return a string which encodes either the method's return, a checked | |||
* exception thrown by the method, or an | |||
* {@link IncompatibleRemoteServiceException} | |||
* {@link IncompatibleRemoteServiceException}, or {@code null} if a protocol version of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised that the exception is in the @return
- could you bump that down to its own @throws
, so we only see null vs value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try { | ||
Class<?> c = getClass(); | ||
boolean foundOverride = false; | ||
while (!foundOverride && c != RemoteServiceServlet.class) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
devil's advocate: what if this method is implemented by an interface with a default method?
(IDEA believes that the m != null
check is always true, but I'm not sure it is right...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think then the RemoteServiceServlet implementation would "hide" the default method illegally. Eclipse reports
"The inherited method RemoteServiceServlet.onAfterResponseSerialized(String) cannot hide the public abstract method in OnAfterResponseSerialized"
when I try this, and this makes sense to me. So I cannot see a way how a (public) default interface method could provide an implementation for onAfterResponseSerialized successfully.
import java.util.Map; | ||
import java.util.Set; | ||
|
||
import com.google.gwt.user.client.rpc.CustomFieldSerializer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think the style check task will yell at you for moving imports like this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} | ||
|
||
public void addEscapedToken(String token) { | ||
addToken(escapeString(token, true, this)); | ||
public void addEscapedToken(String token) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding a thrown exception is (... or should be) an api change, since calling code that doesn't catch it will now not compile.
maybe UncheckedIOException? or find another way to fail in these cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. It was using a RuntimeException wrapper already, but in 9fef84f I've changed to UncheckedIOException and removed the "throws Exception" from the addToken methods.
} | ||
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whitespace nit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* method will be caught, logged in the servlet context, and will cause a generic | ||
* failure message to be sent to the GWT client -- with a 500 status code. To | ||
* customize this behavior, override | ||
* {@link com.google.gwt.user.server.rpc.RemoteServiceServlet#doUnexpectedFailure(java.lang.Throwable) RemoteServiceServlet.doUnexpectedFailure(java.lang.Throwable)}. | ||
* </p> | ||
*/ | ||
package com.google.gwt.user.server.rpc; | ||
package com.google.gwt.user.server.rpc; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed trailing newline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} | ||
|
||
public native boolean readBoolean() /*-{ | ||
return !!this.@com.google.gwt.user.client.rpc.impl.ClientSerializationStreamReader::results[--this.@com.google.gwt.user.client.rpc.impl.ClientSerializationStreamReader::index]; | ||
return !!this.@com.google.gwt.user.client.rpc.impl.ClientSerializationStreamReader::results[ | ||
this.@com.google.gwt.user.client.rpc.impl.AbstractSerializationStream::getVersion()() >= |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before I get into "measure, then cut" territory, have you profiled for any chance in perf here for large byte/int/etc array being deserialized, where instead of just an array access and a decrement, we're also reading a field? Perhaps the compiler correctly detects the general case of "nothing on the client changed the version, so this is always true"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, haven't measured. Contrary to my server-side profiling environment I have no good setup as yet for client-side profiling. Do you? What I can probably try to do is an approximate timing of de-serialization of our 30MB test payload on the client side in case the browsers' built-in profilers can help me with this.
A contemporary Java VM would most certainly do what you suggest and make the additional field access an inexpensive operation. Not sure for the typical JS runtimes, though.
…ssue-7987-compatibility
I'm trying to run these tests using "ant clean test" but I'm getting loads of errors that seem very unlikely to trace back to the changes this PR proposes. Excerpt from the logs of the "ant clean test" job that is now running since approximately six hours:
Am I missing some set-up pre-requisite to run these tests successfully? |
It is possible you've got some env vars that aren't right - check out out the .github/workflows/full-check.yml, which runs only a subset of the possible tests, and succeeds in about 2-3 hrs on github actions. Specifically, try setting TZ to America/Los_Angeles (some tests that verify time formats need to run in a specific tz to behave property), and disable web.htmlunit, nometa.htmlunit, and emma.htmlunit tests:
I spot tested a few of the failures above, and could not reproduce your errors - details from the junit xml outputs (including the path to the file, which tells you the exact target that was run) would help to clear it up. Touching very briefly on the wire format discussion, a few quick points:
One last concern to bring up before I go in for a full read/reply (probably next week due to the holiday here): One thing this patch will break is the ability of the server to send an error response if the full payload couldn't be serialized cleanly - there will be no way of knowing this until part of the payload was sent, with its "200 OK" status line and |
Thanks, this helped to narrow things down. I can now see that the current version of the PR still has a problem with the String table for which long strings still get split and concatenated using "+". From reading the doc for o satisfy Looking further at which Rhino version is being embedded to see if the problem still applies I checked the I would love to remove the String splitting altogether... |
It really seems so, and the problem with the 64k String length limit, as far as I can see it, comes from the "decompiler" where the 1999 version assumes it can encode the String length in a single Is it worth while trying to upgrade the embedded Rhino copy at least to that version? Otherwise we may end up in a mess when String objects longer than 64k characters still need special treatment, perhaps even a special version for decoding. The way this was solved before to me looks like this: if no strings >64k length are encountered, version 8 may be used; otherwise, the version is downgraded to 7, and the client will have to evaluate the payload instead of only parsing it as JSON. With version 9 indicating forward serialization, I wonder how then the combination of forward serialization and "evaluation required" should be encoded. As version 10? |
I'll have to recheck as it has been a long time, but I am almost certain that rhino only exists for legacy dev mode (and even then, only for rpc? that's my memory at least...), so updating it might stir up some things better left alone. I re-read a bunch of the current impl the other day to be able to do a reasonable review, and while I noticed that the request payload's version isn't propagated to help decide how the response should be sent, it seems like it might be possible that for an "old enough" version to be used in deciding the response version. For example, a client that specifies version=8 might only be eligible for 7 or 8, but a client that specifies 9 is able to receive 9+. That would make it possible to let a legacy dev mode client signal that it only wants to received 7 or 8, and side-step this issue. Digging deeper into my brain (I'll try to corroborate with git history or code comments soon), I seem to recall that rhino only exists because legacy dev mode was too slow for giant payloads to cross the JVM<->browser plugin<->JS barriers. In light of the fact that legacy dev mode is harder and harder to use these days (basically only IE11 on certain OSes or htmlunit are even possible, without using unsupported builds of old browsers), it might be okay to turn this off, and always use the web mode implementation. This change might also be somewhat risky, but at least running dev mode tests in htmlunit will verify it. That way the server doesn't need to guess if the client can handle long strings - opting in to version 9 is an explicit "I can actually read JSON" (whether streamed or all in one chunk). |
…f from [email protected]:mozilla/rhino.git; That commit fixes the 64k String length limitation by representing lengths using one char if and only if the length is less than 32k. Otherwise, the length is encoded by two bytes with the highest bit in the first byte being set. See mozilla/rhino@b9df1f2
I looked at the patch provided by commit b9df1f20fc5192b6e50fca2006608ab88d68161 in the original Rhino repo. All it does is changing the length representation in the decompile information such that values <32k are still written as one char, all others as two chars with the most significant bit set to 1 to indicate the case. This then is interpreted in the new So in 13fd8c0 I've cherry-picked the encoding part and am running the tests again now with this. The
which leads me to removing this test now because it fails now with the cherry-picked patch. 2da2a0a |
Not sure I understand the change you desire. Currently, values of type |
The build and tests, when executed as you indicated above, run successfully without errors/failures now locally. |
True, and I don't really have a good solution for this. I've tested this now locally, and for small payloads it seems that along the chains of GZIPOutputStream and HttpResponse's output stream there is still some level of buffering going on so that not a single byte has actually been sent to the client when the whole request fails and sets status 500 with the generic error message as plain/text. For larger payloads, however, of course the effect you foresee occurs. The browser will then see //OK... where the response is cut off at some random point where a server-side exception occurred, and depending on how much was actually already flushed out based on buffering. The browser client then fails to parse the JSON object: This change in behavior is one of those things why I still think that opting in to the new protocol version is something that applications should actively need to do, so as to accept these subtle behavioral changes. I still hope that you wouldn't see this as a show stopper. |
I'm concerned about the breakage we're putting to this, and not seeing a clear path forward. The wire format version is the main one, and will affect us here and future versions. We're effectively bifurcating support here, version 8 vs version 9. If 8 remains the default and 9 supports this specific feature but does not become the new default, what happens when an update is required for 8? Opting in to a new version makes this more confusing - making the subsequent version to be 10 either means jumping over 9 ("min and max are both 10") or finding some other way to branch (for example "even numbers are compatible going forward"). Between that and behavior changes in failing while streaming without some way to recover, it seems like this patch is best seen as a separate feature from existing GWT-RPC, even if it is superior to the existing implementation. Without a clear plan to let version 9 be strictly "better" than version 8 for existing applications though, I don't see how to make that change in place. -- Longs are indeed fine they way they are - I'm only suggesting using the same approach to handle strings, so that they too can be streamed, removing the need to accumulate them over the entire payload, but just send them (and track them like more or less any other object) as soon as they are applicable. |
If you're worried about the contiguousness of versions that you see as alternatives because you assume patches for 8 may be required in the future, I have no problem with numbering forward-serializing versions differently, e.g., MAX_INT-1 or something, and count backwards or MAX_INT/2 and counting upwards; probably enough room for new versions in the foreseeable future.
I can always come up with situations where response serialization can fail in ways that won't recover cleanly; among them may be network errors or an OutOfMemoryError on the server while sending the response out (more likely with 8 than 9 with all the redundant buffers that need to be kept around ;-) ). In those cases a client won't receive a consistent response either. The biggest change that I have observed is that now the client will fail with an exception while de-serializing a response incomplete due to a failure during serialization; whereas with 8 the client would have de-serialized an exception saying the server couldn't serialize. I find this a difference rather insignificant compared to the benefits, but you seem to have a different view on this. Anyway, this is about as much as I can invest here, and it seems we cannot get this resolved, which I regret. In our GWT-adopting project we are already and will have to keep working off the fork which has so many advantages for us and stabilizes our environment a lot. We will definitely not go back to 8 as it endangers our server environment when under heavy load. Based on our experience using this in production since several months now I recommend every GWT project should use this. But without this becoming pulled who knows how bad merge conflicts may become for future RPC versions.
When we need to track the strings I cannot see how this makes things much better; the server will have to hold on to all String objects serialized so far in order to encode future occurrences by an index into the String table instead of the String itself. The key benefit we have seen with the implementation proposed here is that if the bulk of the traffic is made up of primitives (double, int, byte, short, boolean) that don't require a reference in the object table then near zero-footprint serialization can be achieved. |
Sorry, I don't see a compelling reason to simultaneously say "we should not actively encourage upgrading to this new version" as well as "this is the new and better version". By any measure I can see, this is a new and different feature rather than an incremental improvement. With your permission, I'll adapt as much of this as I can for my forward-compatible gwt-rpc fork, so that we can still ship a new, better, and very-slightly-incompatible version that still has all of these features. Cost of copying is reduced in that implementation further, by being able to send binary rather than --
The savings with strings isn't in the tracking of them (since of course we need to track all objects in this way), but in being able to send the data up front rather than a reference to the string that won't be sent until the payload has completed (additionally, it is sometimes required to send a reference to the type of the string, rather than merely the string itself). Since all type tokens are sent as strings, the clients cannot process any of the incoming buffer until the entire payload has been read - even the very first object's type, while being "known" by the client, can't be seen until the string table arrives (and technically it isn't known, since it could potentially be a subclass of the specified type). The savings won't be massive, but if the client can start reading before the server is done writing, it won't be zero either, and the memory savings from many clients sending massive payloads to the server could be decent. Done properly, once a given buffer is read, it can be discarded entirely (or returned to a pool to be written again) rather than retaining the full contents of the incoming payload and only reading once complete.
That sounds correct to me - what I'm suggesting is that we can go further and avoid the extra bookkeeping costs for traffic where some large percentage of the data is made of up small Strings. Consider an extremely large Map<String, String>, where the average key and value each costs ~3 bytes plus (roughly) the decimal size of the string table - for 100k short strings (say 10 chars each?), the strings themselves cost some 1mb, but the extra overhead of not streaming them is 6 (one plus avg decimal length of a number in the approx range 0-100k), a little under 600,000 bytes: that's ~35% avoidable overhead. |
We discussed earlier how bad of an incompatibility this would be for projects that rely on a wire format for some reason. I agreed with your concerns about possibly and unknowingly breaking things by making forward serialization the new default. That's why I suggested to only deprecate backward serialization in an upcoming release and then making forward serialization the new default in some release after that so that those depending on the current wire format get enough notice to change and of course still have an option to stick with the old wire format. Whether this is incremental or not may be splitting hairs. I think it depends on whether a project depends on the wire format, meaning there is a fairly deep entanglement with interfaces that were probably only for internal technical reasons made part of the API unintentionally.
Sure, thanks for asking, but I think contributing here always has in mind that anyone who has an idea for how to use things can do so, so feel free to do with this whatever you like :-).
Yes, for the receiving end on the server side I agree this would be another change that would help conserving memory and avoiding having to buffer the entire payload until the end of the stream. This just hasn't come up during our profiling efforts because the payloads clients send in our cases tend to be small (around one to five orders of magnitude smaller than the payloads sent from the server to the client) and hence haven't caused any trouble we're aware of. I agree that this would be another improvement, and it could probably be added on top of the changes suggested here.
Correct; to us this was, however, a non-issue so far and compared to the other savings not sufficiently significant to justify reversing the client-to-server wire protocol, too. |
Closing (at least for now) as this is untenable due to breaking compatibility. |
This PR changes the serialization such that instead of reversing the order of the payload tokens they are streamed in forward order. If payload-intercepting methods are provided in RemoteServiceServlet subclasses they will still be provided the full payload. Otherwise, the payload will only be streamed to the HTTP response stream and will not exist as a full byte[] or char[] or StringBuffer in the server's memory anymore.
This saves approximately 10% CPU time for response payloads in the 10s of MB range, and it eliminates a seven-times memory footprint during assembling and sending the response payload, reducing memory issues on RPC servers with many concurrent clients making requests that lead to large responses.
Care was taken to not make incompatible API changes. The default version is still using the old wire format, and the new version "9" must be selected explicity (-Dgwt.rpc.version=9) to benefit from the new behavior. To keep existing public method signatures in classes such as
RPC
andRemoteServiceServlet
unchanged, I introduced aThreadLocal
to hand down theWriter
to the buffer that writes or stores the tokens, hoping to minimize negative impact.Fixes #7987