Response-Streaming GWT RPC with backward compatibility; Fixes #7987 #9775 #9779

axeluhl · 2022-10-24T16:15:09Z

This PR changes the serialization such that instead of reversing the order of the payload tokens they are streamed in forward order. If payload-intercepting methods are provided in RemoteServiceServlet subclasses they will still be provided the full payload. Otherwise, the payload will only be streamed to the HTTP response stream and will not exist as a full byte[] or char[] or StringBuffer in the server's memory anymore.

This saves approximately 10% CPU time for response payloads in the 10s of MB range, and it eliminates a seven-times memory footprint during assembling and sending the response payload, reducing memory issues on RPC servers with many concurrent clients making requests that lead to large responses.

Care was taken to not make incompatible API changes. The default version is still using the old wire format, and the new version "9" must be selected explicity (-Dgwt.rpc.version=9) to benefit from the new behavior. To keep existing public method signatures in classes such as RPC and RemoteServiceServlet unchanged, I introduced a ThreadLocal to hand down the Writer to the buffer that writes or stores the tokens, hoping to minimize negative impact.

Fixes #7987

…n forward order on client

…of client-side reader

…r/server/rpc/impl/ServerSerializationStreamReader.java

there are still many open TODOs, and the code is entirely untested yet

… handling

…ward streaming

serializing tokens in forward order for versions >= SERIALIZATION_STREAM_FORWARD_STREAMING_VERSION and backward for earlier versions

new streaming format works with gwt.rpc.version >= 9

…ys using JavaScript concat

…riter

niloc132

Sorry for my delay in posting something formal as a review, between the changing PR and the trickling in of commits I wasn't clear when you were ready. I've read about 80% of it, and like where it is going, I hope this review doesn't seem to be too critical. I'll try to be more prompt in the future, esp if you can signal me when you are sure you're ready for a deep look.

At a minimum to merge this, we'll have to have #9782 and its corresponding tools change landed. In the meantime, you could merge those locally and run the style/api checks.

--

I've been going over the intent of this, supporting streaming a gwt-rpc payload to make use cases where fully de/serializing a payload costs more in memory than you might expect, and I have two main conclusions. They might not be applicable for your own use case, but I think that if this work is worth doing, it is probably worth generalizing further.

First, given this rework, why not move strings out from a separate "string table" in the payload and into the primary payload, so they can go out as the main response is sense, and don't need to be batched up for later? No need to store an array of them, just keep them like normal objects in the identity map (which already has them anyway in at least some cases). In addition to simplifying the payload (why are longs sent as strings in the main payload, but strings go in the string table at the end?), it also has the side effect of letting the recipient stream-process the response, instead of waiting for the complete payload (not applicable with XHR, but non-browser clients or browsers that support Fetch could in theory read async and discard buffers as they come in).

Next, why just responses, why not requests as well? A response does tend to be the obvious first step (javax.servlet can easily stream responses but not requests, and the server tends to be many:one so its payloads seem more important than receiving), but async servlets are available, and as written a server with N large incoming calls would need to hold on to full payloads rather than read and discard as it goes? Unfortunately browser clients can't stream requests at this time even with fetch, but either when that planned improvement lands or through the use of websockets, this would be a benefit.

Making more changes in one batch seems like a better approach, so as to prevent downstream projects that update wire format intermittently from needing to change more than once. It wouldn't be necessary to actually implement the server stream reading at this time (and can't, until #9727 is resolved), but if we were already ready for those changes, a later patch could introduce them when we are prepared.

--

The care you've taken to make this backwards compatible is certainly appreciated, though I wonder if it might make sense to offer a separate set of classes and branch based on the version? Mostly an idle thought, and not a request for a change. Reading the apparent stream version earlier could make the encodeResponse api changes easier, by leaving them as is (and deprecated them), and introducing new methods that require a modern approach. Other changes tend to only affect a simple branch, but as yours is more broad, it could make sense to just extract an interface or base class and build from there?

--

Finally, unit tests - would it be possible to see about tweaking some of the tests to run on both the old versions and this new one? I'm surprised to see they don't already do this, running either individual test classes with some getWireVersion() method overridden different in each implementation, or a loop that tests each expected version. At a minimum I was thinking an updated com.google.gwt.user.server.rpc.RPCTest and com.google.gwt.user.client.rpc.impl.ClientSerializationStreamReaderTest

niloc132 · 2022-11-19T01:22:57Z

user/src/com/google/gwt/user/client/rpc/impl/AbstractSerializationStream.java


  /**
   * The current RPC protocol version.
   */
-  public static final int SERIALIZATION_STREAM_VERSION = 7;
+  public static final int SERIALIZATION_STREAM_VERSION = 8;


Why not advance this all the way to 9? That is, if both sides can use it, is there a reason why 8 should be the default instead (7 is for when a browser might need not-quite-json, 8 is for real-json).

With the incompatibility of the changes and no up-front deprecation warning it seemed a little harsh to me to break adoptions relying on a non-null payload return value as a default. Unless things are being deprecated, I thought that staying with a version compatible to previous APIs would be the better choice.

On the other hand, if the new format is superior, it should be the default, right? If we document it properly (note there isn't a "put release notes here" sort of place, but we just need to accumulate them as we continue with a release, on the issues and PRs themselves. Would be good to put a policy in place around this..), then downstream consumers can force 7 or 8 as desired to retain the old behavior.

niloc132 · 2022-11-19T01:33:30Z

user/src/com/google/gwt/user/server/rpc/RPC.java

@@ -629,23 +693,54 @@ private static int getRpcVersion() throws SerializationException {
   * @param wasThrown if true, the object being returned was an exception thrown
   *          by the service method; if false, it was the result of the service
   *          method's invocation
-   * @return a string that encodes the response from a service method
+   * @return a string that encodes the response from a service method, or {@code null} if a protocol version of


this is a private method - while it is helpful to update its javadoc, the public methods that now return null are even more in need of this note being added. Probably should include some release notes summary we can add, to be clear how this should be used in the future.

I agree that public methods are even more relevant for updating the Javadoc comments. I found a number of public methods, such as

public static String invokeAndEncodeResponse(Object target, Method serviceMethod, Object[] args, SerializationPolicy serializationPolicy, int flags) throws SerializationException {

but they didn't have a Javadoc at all, so I didn't feel it would be appropriate to introduce one just for this change. Are you recommending to do so?

Where can I find and edit the release notes?

niloc132 · 2022-11-19T03:24:23Z

user/src/com/google/gwt/user/server/rpc/RPCServletUtils.java

-  public static void writeResponse(ServletContext servletContext,
-      HttpServletResponse response, String responseContent, boolean gzipResponse)
+  public static Writer createWriterForResponse(ServletContext servletContext,
+      HttpServletResponse response, boolean gzipResponse)


javadoc is now incorrect, responseContent was removed.

That said, this is likely to register as a breaking API change still, so it might make more sense to restore the old version.

Are you suggesting you'd like to preserve writeResponse although it is not called anymore, along the lines of what you wrote about splitting the code into two "branches" where the branch for protocol version 9 wouldn't keep any compatibility?

We can keep writeResponse in addition to the new createWriterForResponse, but it would be dead code in the framwork and at least would require deprecation. I'm not sure, either, how any potential use of writeResponse by some framework or specialization would blend into the modified approach to how the writer is created. Any thoughts?

2ed50ad removes the @param line for now.

It appears that anything in a .server. package is considered "not checked as being part of the API", so the default config won't flag this change... Also it appears I failed to push the actual new config file, so I'll correct that, and maybe discuss widening the "public" api, at least as far as automation is concerned.

niloc132 · 2022-11-19T03:26:03Z

user/src/com/google/gwt/user/server/rpc/RemoteServiceServlet.java

@@ -280,7 +283,9 @@ public final SerializationPolicy getSerializationPolicy(String moduleBaseURL,
   * @param payload the UTF-8 request payload
   * @return a string which encodes either the method's return, a checked
   *         exception thrown by the method, or an
-   *         {@link IncompatibleRemoteServiceException}
+   *         {@link IncompatibleRemoteServiceException}, or {@code null} if a protocol version of


I'm surprised that the exception is in the @return - could you bump that down to its own @throws, so we only see null vs value?

niloc132 · 2022-11-19T03:33:10Z

user/src/com/google/gwt/user/server/rpc/RemoteServiceServlet.java

+    try {
+      Class<?> c = getClass();
+      boolean foundOverride = false;
+      while (!foundOverride && c != RemoteServiceServlet.class) {


devil's advocate: what if this method is implemented by an interface with a default method?

(IDEA believes that the m != null check is always true, but I'm not sure it is right...)

I think then the RemoteServiceServlet implementation would "hide" the default method illegally. Eclipse reports

"The inherited method RemoteServiceServlet.onAfterResponseSerialized(String) cannot hide the public abstract method in OnAfterResponseSerialized"

when I try this, and this makes sense to me. So I cannot see a way how a (public) default interface method could provide an implementation for onAfterResponseSerialized successfully.

niloc132 · 2022-11-19T03:38:56Z

user/src/com/google/gwt/user/server/rpc/impl/ServerSerializationStreamWriter.java

 import java.util.Map;
 import java.util.Set;

+import com.google.gwt.user.client.rpc.CustomFieldSerializer;


i think the style check task will yell at you for moving imports like this

niloc132 · 2022-11-19T03:40:05Z

user/src/com/google/gwt/user/server/rpc/impl/ServerSerializationStreamWriter.java

    }

-    public void addEscapedToken(String token) {
-      addToken(escapeString(token, true, this));
+    public void addEscapedToken(String token) throws IOException {


adding a thrown exception is (... or should be) an api change, since calling code that doesn't catch it will now not compile.

maybe UncheckedIOException? or find another way to fail in these cases?

Good point. It was using a RuntimeException wrapper already, but in 9fef84f I've changed to UncheckedIOException and removed the "throws Exception" from the addToken methods.

niloc132 · 2022-11-19T03:40:14Z

user/src/com/google/gwt/user/server/rpc/impl/ServerSerializationStreamWriter.java

    }
-
+    


whitespace nit

niloc132 · 2022-11-19T03:40:37Z

user/src/com/google/gwt/user/server/rpc/package-info.java

 * method will be caught, logged in the servlet context, and will cause a generic 
 * failure message to be sent to the GWT client -- with a 500 status code.  To 
 * customize this behavior, override 
 * {@link com.google.gwt.user.server.rpc.RemoteServiceServlet#doUnexpectedFailure(java.lang.Throwable) RemoteServiceServlet.doUnexpectedFailure(java.lang.Throwable)}.
 * </p>
 */
-package com.google.gwt.user.server.rpc;
+package com.google.gwt.user.server.rpc;


removed trailing newline

niloc132 · 2022-11-19T03:43:26Z

...t/user/translatable/com/google/gwt/user/client/rpc/impl/ClientSerializationStreamReader.java

  }

  public native boolean readBoolean() /*-{
-    return !!this.@com.google.gwt.user.client.rpc.impl.ClientSerializationStreamReader::results[--this.@com.google.gwt.user.client.rpc.impl.ClientSerializationStreamReader::index];
+    return !!this.@com.google.gwt.user.client.rpc.impl.ClientSerializationStreamReader::results[
+      this.@com.google.gwt.user.client.rpc.impl.AbstractSerializationStream::getVersion()() >=


Before I get into "measure, then cut" territory, have you profiled for any chance in perf here for large byte/int/etc array being deserialized, where instead of just an array access and a decrement, we're also reading a field? Perhaps the compiler correctly detects the general case of "nothing on the client changed the version, so this is always true"?

No, haven't measured. Contrary to my server-side profiling environment I have no good setup as yet for client-side profiling. Do you? What I can probably try to do is an approximate timing of de-serialization of our 30MB test payload on the client side in case the browsers' built-in profilers can help me with this.

A contemporary Java VM would most certainly do what you suggest and make the additional field access an inexpensive operation. Not sure for the typical JS runtimes, though.

…thod

…ssue-7987-compatibility

…-gzip

axeluhl · 2022-11-22T23:30:02Z

I'm trying to run these tests using "ant clean test" but I'm getting loads of errors that seem very unlikely to trace back to the changes this PR proposes. Excerpt from the logs of the "ant clean test" job that is now running since approximately six hours:

[junit] Tests run: 341, Failures: 0, Errors: 245, Time elapsed: 2,861.31 sec                                    [0/1991]
[junit] Test com.google.gwt.dev.jjs.CompilerSuite FAILED
[junit] dir attribute ignored if running in the same VM
[junit] Running com.google.gwt.dev.jjs.CrossSiteIframeRunAsyncSuite
[junit] Tests run: 67, Failures: 0, Errors: 60, Time elapsed: 2,882.484 sec
[junit] Tests run: 341, Failures: 0, Errors: 245, Time elapsed: 2,880.174 sec
[junit] Test com.google.gwt.dom.DOMSuite FAILED
[junit] dir attribute ignored if running in the same VM
[junit] Running com.google.gwt.dom.builder.ElementBuilderGwtSuite
[junit] Test com.google.gwt.dev.jjs.CompilerSuite FAILED
[junit] dir attribute ignored if running in the same VM
[junit] Running com.google.gwt.dev.jjs.CrossSiteIframeRunAsyncSuite
[junit] Tests run: 11, Failures: 0, Errors: 11, Time elapsed: 118.074 sec
[junit] Tests run: 11, Failures: 0, Errors: 11, Time elapsed: 78.807 sec
[junit] Tests run: 1369, Failures: 0, Errors: 1369, Time elapsed: 101.484 sec
[junit] Tests run: 1369, Failures: 0, Errors: 1369, Time elapsed: 118.576 sec
[junit] Test com.google.gwt.dev.jjs.CrossSiteIframeRunAsyncSuite FAILED
[junit] dir attribute ignored if running in the same VM
[junit] Running com.google.gwt.dev.jjs.CrossSiteRunAsyncSuite
[junit] Test com.google.gwt.dev.jjs.CrossSiteIframeRunAsyncSuite FAILED
[junit] dir attribute ignored if running in the same VM
[junit] Running com.google.gwt.dev.jjs.CrossSiteRunAsyncSuite
[junit] Test com.google.gwt.dom.builder.ElementBuilderGwtSuite FAILED
[junit] dir attribute ignored if running in the same VM
[junit] Test com.google.gwt.dom.builder.ElementBuilderGwtSuite FAILED
[junit] dir attribute ignored if running in the same VM
[junit] Running com.google.gwt.editor.EditorSuite
[junit] Running com.google.gwt.editor.EditorSuite
[junit] Tests run: 12, Failures: 0, Errors: 0, Time elapsed: 46.948 sec
[junit] dir attribute ignored if running in the same VM
[junit] Running com.google.gwt.dev.jjs.ExtraCompilerSuite
[junit] Tests run: 12, Failures: 0, Errors: 0, Time elapsed: 47.002 sec
[junit] dir attribute ignored if running in the same VM
[junit] Running com.google.gwt.dev.jjs.ExtraCompilerSuite

Am I missing some set-up pre-requisite to run these tests successfully?

niloc132 · 2022-11-23T21:08:30Z

It is possible you've got some env vars that aren't right - check out out the .github/workflows/full-check.yml, which runs only a subset of the possible tests, and succeeds in about 2-3 hrs on github actions. Specifically, try setting TZ to America/Los_Angeles (some tests that verify time formats need to run in a specific tz to behave property), and disable web.htmlunit, nometa.htmlunit, and emma.htmlunit tests:


export \
            TZ=America/Los_Angeles \
            ANT_OPTS='-Dfile.encoding=UTF8 -Xmx2g' \
            GWT_VERSION=HEAD-SNAPSHOT

ant clean test dist doc \
            -Dtest.jvmargs='-ea -Dgwt.watchFileChanges=false' \
            -Dtest.web.htmlunit.disable=true \
            -Dtest.nometa.htmlunit.disable=true \
            -Dtest.emma.htmlunit.disable=true

I spot tested a few of the failures above, and could not reproduce your errors - details from the junit xml outputs (including the path to the file, which tells you the exact target that was run) would help to clear it up.

Touching very briefly on the wire format discussion, a few quick points:

No support for Jakarta EE 9 #9727 is not a prerequisite to changing the format, but only a requirement to have the server fully take advantage of a changed format - in the meantime, a client could still take advantage of the changes
Longs are already written as strings using base64, to be a bit more dense than just [0-9]{1-19} or whatnot. The difference then is that instead of writing a offset into the string table, the string itself would be written. I took a few minutes and successfully completed that change (but not the rest of what you've done) into my gwt-rpc fork, so I'm quite confident this isn't a scary change. Note however, that not all strings are just the offset - some start off with "hey, i'm a string", which is first an offset to the string java.lang.String in the string table, and then the offset of the string itself (example, a Map<String, String> is going to behave this way, since each key/value from the perspective of the Map_CustomFieldSerializer is being written as an Object, not directly as a String).
The string objects would still need to be retained for the life of the stream, just like all objects are retained after being written, in case another reference is made to them. But as far as I can tell, it is unnecessary to batch them all up at the end of the payload, but write them as they are encountered, which really only saves the size of the string table (but still for gigantic payloads, that may be non-trivial).
The version could still be the very first token, so that the remote end can recognize that it can't recognize inputs.

One last concern to bring up before I go in for a full read/reply (probably next week due to the holiday here): One thing this patch will break is the ability of the server to send an error response if the full payload couldn't be serialized cleanly - there will be no way of knowing this until part of the payload was sent, with its "200 OK" status line and //OK prefix. It might make sense to have a way to communicate to the remote reader "oh nevermind, everything is ruined, abort!", instead of just abruptly ending the string or trying to recover. With http/2, this is "easier" in theory, since the message can include trailers, but unfortunately browsers still don't read trailers.

axeluhl · 2022-11-25T11:09:37Z

It is possible you've got some env vars that aren't right - check out out the .github/workflows/full-check.yml, which runs only a subset of the possible tests, and succeeds in about 2-3 hrs on github actions.

Thanks, this helped to narrow things down. I can now see that the current version of the PR still has a problem with the String table for which long strings still get split and concatenated using "+". From reading the doc for o satisfy ServerSerializationStreamWriter.MAX_STRING_NODE_LENGTH it seems this is done only to satisfy an old Rhino issue.

Looking further at which Rhino version is being embedded to see if the problem still applies I checked the COPYING document under com.google.gwt.dev.js.rhino, and it refers to Rhino 1.5R3 from May 6, 1999. Is this really what's still in there?

I would love to remove the String splitting altogether...

axeluhl · 2022-11-25T17:05:18Z

Looking further at which Rhino version is being embedded to see if the problem still applies I checked the COPYING document under com.google.gwt.dev.js.rhino, and it refers to Rhino 1.5R3 from May 6, 1999. Is this really what's still in there?

It really seems so, and the problem with the 64k String length limit, as far as I can see it, comes from the "decompiler" where the 1999 version assumes it can encode the String length in a single char value. The decompiler was fixed in the Rhino project already as of commit b9df1f20fc5192b6e50fca2006608ab88d68161f (mozilla/rhino@b9df1f2).

Is it worth while trying to upgrade the embedded Rhino copy at least to that version? Otherwise we may end up in a mess when String objects longer than 64k characters still need special treatment, perhaps even a special version for decoding.

The way this was solved before to me looks like this: if no strings >64k length are encountered, version 8 may be used; otherwise, the version is downgraded to 7, and the client will have to evaluate the payload instead of only parsing it as JSON. With version 9 indicating forward serialization, I wonder how then the combination of forward serialization and "evaluation required" should be encoded. As version 10?

niloc132 · 2022-11-25T19:58:52Z

I'll have to recheck as it has been a long time, but I am almost certain that rhino only exists for legacy dev mode (and even then, only for rpc? that's my memory at least...), so updating it might stir up some things better left alone. I re-read a bunch of the current impl the other day to be able to do a reasonable review, and while I noticed that the request payload's version isn't propagated to help decide how the response should be sent, it seems like it might be possible that for an "old enough" version to be used in deciding the response version. For example, a client that specifies version=8 might only be eligible for 7 or 8, but a client that specifies 9 is able to receive 9+. That would make it possible to let a legacy dev mode client signal that it only wants to received 7 or 8, and side-step this issue.

Digging deeper into my brain (I'll try to corroborate with git history or code comments soon), I seem to recall that rhino only exists because legacy dev mode was too slow for giant payloads to cross the JVM<->browser plugin<->JS barriers. In light of the fact that legacy dev mode is harder and harder to use these days (basically only IE11 on certain OSes or htmlunit are even possible, without using unsupported builds of old browsers), it might be okay to turn this off, and always use the web mode implementation. This change might also be somewhat risky, but at least running dev mode tests in htmlunit will verify it. That way the server doesn't need to guess if the client can handle long strings - opting in to version 9 is an explicit "I can actually read JSON" (whether streamed or all in one chunk).

…f from [email protected]:mozilla/rhino.git; That commit fixes the 64k String length limitation by representing lengths using one char if and only if the length is less than 32k. Otherwise, the length is encoded by two bytes with the highest bit in the first byte being set. See mozilla/rhino@b9df1f2

axeluhl · 2022-11-28T13:43:39Z

I looked at the patch provided by commit b9df1f20fc5192b6e50fca2006608ab88d68161 in the original Rhino repo. All it does is changing the length representation in the decompile information such that values <32k are still written as one char, all others as two chars with the most significant bit set to 1 to indicate the case. This then is interpreted in the new Parser.getSourceString method which is used in the NativeFunction class which isn't part of the Rhino feature set copied into the GWT project.

So in 13fd8c0 I've cherry-picked the encoding part and am running the tests again now with this.

The ClientSerializationStreamReaderTest has a test case testRead_stringOver64KB with the following Javadoc:

  /*
   * Note: this test verifies a issue with the Rhino parser that limits the size of a single string
   * node to 64KB. If this test starts failing, then the Rhino parser may have been fixed to support
   * larger strings and the string concat workaround could be removed.
   */

which leads me to removing this test now because it fails now with the cherry-picked patch. 2da2a0a

axeluhl · 2022-11-28T13:53:21Z

Longs are already written as strings using base64, to be a bit more dense than just [0-9]{1-19} or whatnot. The difference then is that instead of writing a offset into the string table, the string itself would be written. I took a few minutes and successfully completed that change (but not the rest of what you've done) into my gwt-rpc fork, so I'm quite confident this isn't a scary change. Note however, that not all strings are just the offset - some start off with "hey, i'm a string", which is first an offset to the string java.lang.String in the string table, and then the offset of the string itself (example, a Map<String, String> is going to behave this way, since each key/value from the perspective of the Map_CustomFieldSerializer is being written as an Object, not directly as a String).

Not sure I understand the change you desire. Currently, values of type long are, as you say, written as base64-encoded String right into the payload without making it to the string table at the end. I can only say that at least for our use case it is very good the way it is. As you say, Base64 on average improves density, and when we use long values, e.g., to encode time stamps in a data series then there is no benefit in putting these to the string table, but there is the additional cost of having to keep all of those long values or their Base64-encoded String representations around until done. Therefore I think regarding long values things are fine the way they are.

axeluhl · 2022-11-28T16:16:41Z

which leads me to removing this test now because it fails now with the cherry-picked patch. 2da2a0a

The build and tests, when executed as you indicated above, run successfully without errors/failures now locally.

axeluhl · 2022-11-28T17:38:56Z

One last concern to bring up before I go in for a full read/reply (probably next week due to the holiday here): One thing this patch will break is the ability of the server to send an error response if the full payload couldn't be serialized cleanly - there will be no way of knowing this until part of the payload was sent, with its "200 OK" status line and //OK prefix. It might make sense to have a way to communicate to the remote reader "oh nevermind, everything is ruined, abort!", instead of just abruptly ending the string or trying to recover. With http/2, this is "easier" in theory, since the message can include trailers, but unfortunately browsers still don't read trailers.

True, and I don't really have a good solution for this. I've tested this now locally, and for small payloads it seems that along the chains of GZIPOutputStream and HttpResponse's output stream there is still some level of buffering going on so that not a single byte has actually been sent to the client when the whole request fails and sets status 500 with the generic error message as plain/text.

For larger payloads, however, of course the effect you foresee occurs. The browser will then see //OK... where the response is cut off at some random point where a server-side exception occurred, and depending on how much was actually already flushed out based on buffering. The browser client then fails to parse the JSON object: SyntaxError: JSON.parse: unexpected character at line 1 column 1 of the JSON data is what I see on Firefox, and when debugging this in Chrome I see com.google.gwt.user.client.rpc.StatusCodeException: 0 which seems to be caused by the GZIP encoding identified as broken: (failed) net::ERR_INCOMPLETE_CHUNKED_ENCODING. The exception is forwarded to the onFailure method as expected, but of course the client has no proper information about the original reason. Server-side, however, the original exception that occurred during serialization is logged.

This change in behavior is one of those things why I still think that opting in to the new protocol version is something that applications should actively need to do, so as to accept these subtle behavioral changes. I still hope that you wouldn't see this as a show stopper.

niloc132 · 2023-01-31T03:49:23Z

I'm concerned about the breakage we're putting to this, and not seeing a clear path forward.

The wire format version is the main one, and will affect us here and future versions. We're effectively bifurcating support here, version 8 vs version 9. If 8 remains the default and 9 supports this specific feature but does not become the new default, what happens when an update is required for 8? Opting in to a new version makes this more confusing - making the subsequent version to be 10 either means jumping over 9 ("min and max are both 10") or finding some other way to branch (for example "even numbers are compatible going forward").

Between that and behavior changes in failing while streaming without some way to recover, it seems like this patch is best seen as a separate feature from existing GWT-RPC, even if it is superior to the existing implementation. Without a clear plan to let version 9 be strictly "better" than version 8 for existing applications though, I don't see how to make that change in place.

--

Longs are indeed fine they way they are - I'm only suggesting using the same approach to handle strings, so that they too can be streamed, removing the need to accumulate them over the entire payload, but just send them (and track them like more or less any other object) as soon as they are applicable.

axeluhl · 2023-01-31T16:28:34Z

I'm concerned about the breakage we're putting to this, and not seeing a clear path forward.

The wire format version is the main one, and will affect us here and future versions. We're effectively bifurcating support here, version 8 vs version 9. If 8 remains the default and 9 supports this specific feature but does not become the new default, what happens when an update is required for 8? Opting in to a new version makes this more confusing - making the subsequent version to be 10 either means jumping over 9 ("min and max are both 10") or finding some other way to branch (for example "even numbers are compatible going forward").

If you're worried about the contiguousness of versions that you see as alternatives because you assume patches for 8 may be required in the future, I have no problem with numbering forward-serializing versions differently, e.g., MAX_INT-1 or something, and count backwards or MAX_INT/2 and counting upwards; probably enough room for new versions in the foreseeable future.

Between that and behavior changes in failing while streaming without some way to recover, it seems like this patch is best seen as a separate feature from existing GWT-RPC, even if it is superior to the existing implementation. Without a clear plan to let version 9 be strictly "better" than version 8 for existing applications though, I don't see how to make that change in place.

I can always come up with situations where response serialization can fail in ways that won't recover cleanly; among them may be network errors or an OutOfMemoryError on the server while sending the response out (more likely with 8 than 9 with all the redundant buffers that need to be kept around ;-) ). In those cases a client won't receive a consistent response either. The biggest change that I have observed is that now the client will fail with an exception while de-serializing a response incomplete due to a failure during serialization; whereas with 8 the client would have de-serialized an exception saying the server couldn't serialize. I find this a difference rather insignificant compared to the benefits, but you seem to have a different view on this.

Anyway, this is about as much as I can invest here, and it seems we cannot get this resolved, which I regret. In our GWT-adopting project we are already and will have to keep working off the fork which has so many advantages for us and stabilizes our environment a lot. We will definitely not go back to 8 as it endangers our server environment when under heavy load.

Based on our experience using this in production since several months now I recommend every GWT project should use this. But without this becoming pulled who knows how bad merge conflicts may become for future RPC versions.

Longs are indeed fine they way they are - I'm only suggesting using the same approach to handle strings, so that they too can be streamed, removing the need to accumulate them over the entire payload, but just send them (and track them like more or less any other object) as soon as they are applicable.

When we need to track the strings I cannot see how this makes things much better; the server will have to hold on to all String objects serialized so far in order to encode future occurrences by an index into the String table instead of the String itself. The key benefit we have seen with the implementation proposed here is that if the bulk of the traffic is made up of primitives (double, int, byte, short, boolean) that don't require a reference in the object table then near zero-footprint serialization can be achieved.

niloc132 · 2023-02-09T01:17:15Z

Sorry, I don't see a compelling reason to simultaneously say "we should not actively encourage upgrading to this new version" as well as "this is the new and better version". By any measure I can see, this is a new and different feature rather than an incremental improvement.

With your permission, I'll adapt as much of this as I can for my forward-compatible gwt-rpc fork, so that we can still ship a new, better, and very-slightly-incompatible version that still has all of these features. Cost of copying is reduced in that implementation further, by being able to send binary rather than |-delimited strings, and streaming contents over a websocket is already natively supported. Perhaps you'll be able to consider that implementation one day, and move away from GWT Generators, towards something that will leave you open to JVM clients, and J2CL compatibility.

--

When we need to track the strings I cannot see how this makes things much better; the server will have to hold on to all String objects serialized so far in order to encode future occurrences by an index into the String table instead of the String itself.

The savings with strings isn't in the tracking of them (since of course we need to track all objects in this way), but in being able to send the data up front rather than a reference to the string that won't be sent until the payload has completed (additionally, it is sometimes required to send a reference to the type of the string, rather than merely the string itself). Since all type tokens are sent as strings, the clients cannot process any of the incoming buffer until the entire payload has been read - even the very first object's type, while being "known" by the client, can't be seen until the string table arrives (and technically it isn't known, since it could potentially be a subclass of the specified type). The savings won't be massive, but if the client can start reading before the server is done writing, it won't be zero either, and the memory savings from many clients sending massive payloads to the server could be decent. Done properly, once a given buffer is read, it can be discarded entirely (or returned to a pool to be written again) rather than retaining the full contents of the incoming payload and only reading once complete.

The key benefit we have seen with the implementation proposed here is that if the bulk of the traffic is made up of primitives (double, int, byte, short, boolean) that don't require a reference in the object table then near zero-footprint serialization can be achieved.

That sounds correct to me - what I'm suggesting is that we can go further and avoid the extra bookkeeping costs for traffic where some large percentage of the data is made of up small Strings. Consider an extremely large Map<String, String>, where the average key and value each costs ~3 bytes plus (roughly) the decimal size of the string table - for 100k short strings (say 10 chars each?), the strings themselves cost some 1mb, but the extra overhead of not streaming them is 6 (one plus avg decimal length of a number in the approx range 0-100k), a little under 600,000 bytes: that's ~35% avoidable overhead.

axeluhl · 2023-02-13T09:45:08Z

Sorry, I don't see a compelling reason to simultaneously say "we should not actively encourage upgrading to this new version" as well as "this is the new and better version". By any measure I can see, this is a new and different feature rather than an incremental improvement.

We discussed earlier how bad of an incompatibility this would be for projects that rely on a wire format for some reason. I agreed with your concerns about possibly and unknowingly breaking things by making forward serialization the new default. That's why I suggested to only deprecate backward serialization in an upcoming release and then making forward serialization the new default in some release after that so that those depending on the current wire format get enough notice to change and of course still have an option to stick with the old wire format.

Whether this is incremental or not may be splitting hairs. I think it depends on whether a project depends on the wire format, meaning there is a fairly deep entanglement with interfaces that were probably only for internal technical reasons made part of the API unintentionally.

With your permission, I'll adapt as much of this as I can for my forward-compatible gwt-rpc fork, so that we can still ship a new, better, and very-slightly-incompatible version that still has all of these features. Cost of copying is reduced in that implementation further, by being able to send binary rather than |-delimited strings, and streaming contents over a websocket is already natively supported. Perhaps you'll be able to consider that implementation one day, and move away from GWT Generators, towards something that will leave you open to JVM clients, and J2CL compatibility.

Sure, thanks for asking, but I think contributing here always has in mind that anyone who has an idea for how to use things can do so, so feel free to do with this whatever you like :-).

When we need to track the strings I cannot see how this makes things much better; the server will have to hold on to all String objects serialized so far in order to encode future occurrences by an index into the String table instead of the String itself.

The savings with strings isn't in the tracking of them (since of course we need to track all objects in this way), but in being able to send the data up front rather than a reference to the string that won't be sent until the payload has completed (additionally, it is sometimes required to send a reference to the type of the string, rather than merely the string itself). Since all type tokens are sent as strings, the clients cannot process any of the incoming buffer until the entire payload has been read - even the very first object's type, while being "known" by the client, can't be seen until the string table arrives (and technically it isn't known, since it could potentially be a subclass of the specified type). The savings won't be massive, but if the client can start reading before the server is done writing, it won't be zero either, and the memory savings from many clients sending massive payloads to the server could be decent. Done properly, once a given buffer is read, it can be discarded entirely (or returned to a pool to be written again) rather than retaining the full contents of the incoming payload and only reading once complete.

Yes, for the receiving end on the server side I agree this would be another change that would help conserving memory and avoiding having to buffer the entire payload until the end of the stream. This just hasn't come up during our profiling efforts because the payloads clients send in our cases tend to be small (around one to five orders of magnitude smaller than the payloads sent from the server to the client) and hence haven't caused any trouble we're aware of. I agree that this would be another improvement, and it could probably be added on top of the changes suggested here.

The key benefit we have seen with the implementation proposed here is that if the bulk of the traffic is made up of primitives (double, int, byte, short, boolean) that don't require a reference in the object table then near zero-footprint serialization can be achieved.

That sounds correct to me - what I'm suggesting is that we can go further and avoid the extra bookkeeping costs for traffic where some large percentage of the data is made of up small Strings. Consider an extremely large Map<String, String>, where the average key and value each costs ~3 bytes plus (roughly) the decimal size of the string table - for 100k short strings (say 10 chars each?), the strings themselves cost some 1mb, but the extra overhead of not streaming them is 6 (one plus avg decimal length of a number in the approx range 0-100k), a little under 600,000 bytes: that's ~35% avoidable overhead.

Correct; to us this was, however, a non-issue so far and compared to the other savings not sufficiently significant to justify reversing the client-to-server wire protocol, too.

niloc132 · 2023-11-15T01:20:08Z

Closing (at least for now) as this is untenable due to breaking compatibility.

Axel Uhl added 23 commits September 19, 2022 11:34

Issue 7987: writing tokens on server in forward order, reading them i…

b2a0bcd

…n forward order on client

issue 7987: also adjusted to increasing index in JSNI implementation …

47e80f6

…of client-side reader

issue 7987: continue with forward-serialization of payload

a66f14d

issue 7987: implemented readIntFromEnd in user/src/com/google/gwt/use…

08eae9c

…r/server/rpc/impl/ServerSerializationStreamReader.java

Merge remote-tracking branch 'github/main' into issue-7987

6560dc9

issue 7987: using streaming for RPC response;

7a07381

there are still many open TODOs, and the code is entirely untested yet

Merge remote-tracking branch 'github/issue-7987' into issue-7987

e15e235

issue 7987: use TeeWriter if payload is to be intercepted; fixed GZIP…

3d89075

… handling

issue 7987: making TeeWriter constructor public

f883292

issue 7987: don't use toString() on LengthConstrainedArray

ac914dd

issue 7987: added missing call to finish() for serializing String table

e9092fb

issue 7987: fixed import order where messed up; fixed indentation

141fab5

issue 7987: fixed severity="error" checkstyle problems

024cfcd

issue 7987: removed obsolete @return comment

b078060

Merge remote-tracking branch 'github/issue-7987' into issue-7987

09a6e02

issue 7987: removed obsolete @return comments for payload return

53d5b62

issue-7987: introduced a new version 9 for RPC constant footprint for…

07e3399

…ward streaming

issue-7987: introduced SERIALIZATION_STREAM_FORWARD_STREAMING_VERSION;

e384a2a

serializing tokens in forward order for versions >= SERIALIZATION_STREAM_FORWARD_STREAMING_VERSION and backward for earlier versions

issue-7987: backward compatibility for old stream format back in place;

11bc6c4

new streaming format works with gwt.rpc.version >= 9

issue-7987: backward-compatible method signatures, optional StringWriter

e64d13e

issue-7987: fixed checkstyle violations (imports used only in Javadoc)

0fce1ae

issue-7987: make 8 the new default version; stop splitting large arra…

fd190f1

…ys using JavaScript concat

issue-7987: fixed JSNI code for getVersion() call

cd80a5f

axeluhl mentioned this pull request Oct 24, 2022

Response-Streaming GWT RPC; Fixes #7987 #9775

Closed

issue 7987: offer RPC.finishResponse() for flushing/closing request w…

d5ff325

…riter

niloc132 reviewed Nov 19, 2022

View reviewed changes

Axel Uhl added 4 commits November 20, 2022 19:37

issue-7987: removing @param comment for parameter that got removed

2ed50ad

issue-7987: separated @return from @throws in RemoteServiceServlet me…

3f68482

…thod

issue-7987: reverted accidental import order change

9f934a9

issue-7987: use UncheckedIOException instead of general RuntimeException

9fef84f

Axel Uhl added 7 commits November 21, 2022 18:29

Merge remote-tracking branch 'github/issue-7987-compatibility' into i…

45f6239

…ssue-7987-compatibility

Merge branch 'issue-7987-compatibility' into issue-7987-compatibility…

9b4e1b5

…-gzip

fixed write(byte[], int, int) method in LazyGzipCompressingOutputStream

be569ed

removed debug stats counters

f82eeab

also removed *usages* of debug counters...

f4b482d

issue-7987: adjusted tests after removing IE backward compatibility

f724fe7

iddue-7987: extended tests to also try new version 9

5113e54

issue-7987: when writing string table don't split strings using "+"

7f5b894

Axel Uhl added 2 commits November 28, 2022 05:21

issue-7987: removing now obsolete testRead_stringOver64KB test

2da2a0a

Merge branch 'main' into issue-7987-compatibility

f1725b5

Merge branch 'main' into issue-7987-compatibility

4a16fed

Axel Uhl added 2 commits February 28, 2023 17:23

Merge branch 'main' into issue-7987-compatibility

fb9ab1f

added missing newline

ccbddd0

niloc132 mentioned this pull request Mar 17, 2023

Investigate streaming payload support Vertispan/gwt-rpc#44

Open

niloc132 closed this Nov 15, 2023

Response-Streaming GWT RPC with backward compatibility; Fixes #7987 #9775 #9779

Response-Streaming GWT RPC with backward compatibility; Fixes #7987 #9775 #9779

Conversation

axeluhl commented Oct 24, 2022 • edited Loading

niloc132 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

axeluhl commented Nov 22, 2022

niloc132 commented Nov 23, 2022

axeluhl commented Nov 25, 2022

axeluhl commented Nov 25, 2022

niloc132 commented Nov 25, 2022

axeluhl commented Nov 28, 2022

axeluhl commented Nov 28, 2022

axeluhl commented Nov 28, 2022

axeluhl commented Nov 28, 2022

niloc132 commented Jan 31, 2023

axeluhl commented Jan 31, 2023

niloc132 commented Feb 9, 2023

axeluhl commented Feb 13, 2023

niloc132 commented Nov 15, 2023

axeluhl commented Oct 24, 2022 •

edited

Loading