Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Spike] Pre-requisite for fault-tolerant client-server #20418

Closed
3 tasks done
mshabarov opened this issue Nov 6, 2024 · 5 comments
Closed
3 tasks done

[Spike] Pre-requisite for fault-tolerant client-server #20418

mshabarov opened this issue Nov 6, 2024 · 5 comments

Comments

@mshabarov
Copy link
Contributor

mshabarov commented Nov 6, 2024

Describe your motivation

This is a pre-requisite investigation task before we can start #20348.
The description there is not enough to start the development and it costs too many story points from the team's opinion, thus needs a decomposition.

The outcome of this investigation should be:

  • Describing and understand what are the cases might be, when message ID is unexpected. This should give enough technical details to deduct what and where exactly we have to change in framework . This ticket can be considered as a source of real observations Unexpected message id from the client (sync ids different) #12640, but not only limited to those.
  • A list of enhancements with <= 8 story points that could address the cases, where the network is a culprit, i.e. when the messages are lost. The cases, caused by misconfiguration of session replication, should be described, but out of scope of further enhancements in framework. Though they should be properly documented.
  • A testing procedure needs to be ready, at least the idea how we can test these enhancements and guarantee that they cover real cases.
@caalador
Copy link
Contributor

caalador commented Nov 13, 2024

Cases seen in reported logs:

  1. Ignore old message from client
    • Server receives previous message id with the same content hash as last processed
    • Server should resend last response to sync state on client.
  2. Unexpected id from client
    • If the client id is bigger than the expected there has been a dropped client request. (client updates the id on preparePayload)
    • Client should be requested to resend previous message and server should sync the id to current request id.
  3. Server expected much bigger than the client request?
    • Resynchronization should probably happen as the gap is too big and why is the client requesting with much too old id?

@caalador
Copy link
Contributor

As the server always answers the client, the client should probably queue the sends as it does for one single pushPendingMessage when using bidirectional transport, but not for xhr. Also send(JsonObject payload) is public and doesn't do any queueing so that sendInvocationsToServer would purge them instead of pushing it out immediately where we could possibly get faulty sync id from the client.

@mshabarov mshabarov self-assigned this Nov 13, 2024
@mshabarov mshabarov moved this from 🟢Ready to Go to ⚒️ In progress in Vaadin Flow ongoing work (Vaadin 10+) Nov 13, 2024
@mshabarov
Copy link
Contributor Author

mshabarov commented Nov 18, 2024

@caalador has covered the majority of cases. Let me just extend it a bit.
Here are the cases clusters that I can extract based on recync issues we have in Flow:

  1. Ignore old message from client

    Usually logged as like com.vaadin.flow.server.communication.ServerRpcHandler | Ignoring old duplicate message from the client. Expected: 1748, got: 1747.
    An often case reported several times. There is a comment that dirty flags have been cleared on the server, thus Flow makes a resync. See


    Server should resend a response.

  2. Unexpected id from client - clientId > expected sync Id

    Seems the most often case.
    Usually logged as Unexpected message id from the client. Expected sync id: 34, got 35. more details logged on DEBUG level..
    Client should resend the message and server should accept it.

  3. Unexpected id from client - clientId < expected sync Id

    This is similar to (1), but the message from client has different hash than the latest processed already by the server.
    This can be split into two possible different cases:
    - client-to-server ID and server-to-client ID have a big gap between each other. This case looks like a misconfiguration of session replication, when the client message is routed to a wrong session/UI and thus the UI state is completely different, e.g.:
    Unexpected message id from the client. Expected sync id: 41, got 39. Message start: {"csrfToken":"7f83ca3d-e864-4362-9dda-a8af3e641780","rpc":[],"syncId":52,"clientId":39,"resynchronize":true}
    - client-to-server ID and server-to-client ID differ by just 1, see below. I have no clues so far in which circumstances this may happen.
    Unexpected message id from the client. Expected sync id: 2, got 3. {"csrfToken":"40cf2444-6f3f-42a2-bc41-b7df9aa4d1bf","rpc":[],"syncId":2,"clientId":3,"UNLOAD":true}
    - client-to-server ID and server-to-client ID differ by just 1, but client has a much bigger number of syncId that it got from server's response, e.g.: Unexpected message id from the client. Expected sync id: 10, got 9. Message start: {...,"syncId":61,"clientId":9}, probably something wrong with the reverse proxy that passes message from server to client , but doesn't pass the other way.

  4. Possible resynchronisations when logging in/out

    Usually logged exactly as Unexpected message id from the client. Expected sync id: 2, got 3 - clientID=3 and syncId=2, the values can be also 1 and 2.
    This is quite often case as well.
    Likely fixed by fix: delay session expiration handling to prevent canceling ongoing navigation #19983 and by fix: remove default login listener when setting form action flow-components#6669.

  5. Possible issues in cloud environments where session replication resets the sync ID

    This is reported as Unexpected message id from the client. Expected sync id: 0, got 1. Rare case, probably caused by always resetting the lastProcessedClientToServerId to -1. See Vaadin throws exception Unexpected message id from the client when Spring Session Serialization is used #4746 and Unexpected message id from the client (sync ids different) #12640 (comment)

@mshabarov
Copy link
Contributor Author

mshabarov commented Nov 19, 2024

@mshabarov
Copy link
Contributor Author

For testing I'd propose to not use any heavy third-party tools, but instead to intercept the request/response right away in a Vaadin applications.

  1. Using a custom AtmosphereInterceptor to test cases with Vaadin Push / WebSockets:

inspect(AtmosphereResource r) — Called when a client sends a message to the server.
postInspect(AtmosphereResource r) — Called when a server sends a message to the client.

An example:

import org.atmosphere.config.managed.AtmosphereInterceptorAdapter;
import org.atmosphere.cpr.AtmosphereResource;
import org.atmosphere.cpr.AtmosphereResponse;
import org.atmosphere.cpr.AtmosphereRequest;

public class CustomWebSocketInterceptor extends AtmosphereInterceptorAdapter {

    @Override
    public Action inspect(AtmosphereResource resource) {
        AtmosphereRequest request = resource.getRequest();

        // Intercept client-to-server messages
        String clientMessage = request.getReader().lines()
                .reduce("", (accumulator, actual) -> accumulator + actual);

        // Simulate dropping a message from the client
        if (clientMessage.contains("something")) {
            return Action.CANCELLED; // Stops further processing
        }
        return Action.CONTINUE;
    }

    @Override
    public void postInspect(AtmosphereResource resource) {
        AtmosphereResponse response = resource.getResponse();

        // Intercept server-to-client messages
        String serverMessage = response.toString();

        // Simulate altering a response message
        if (serverMessage.contains("something")) {
            String modifiedMessage = serverMessage.replace("something", "MODIFIED");
            response.write(modifiedMessage);
        }
    }

    @Override
    public void destroy() {
        // Cleanup if needed
    }
}

Then declare the interceptor in atmosphere.xml:

<atmosphere-handlers>
    <mapping path="/*" support-session="true" broadcaster-cache="org.atmosphere.cache.UUIDBroadcasterCache">
        <interceptor>com.example.CustomWebSocketInterceptor</interceptor>
    </mapping>
</atmosphere-handlers>

What this can give us:

  • Drop client-to-server or server-to-client messages to simulate network issues.
  • Modify the content of messages dynamically, e.g. to change the clientId/syncId.
  • Delay message processing to simulate slow networks.
  1. For communication over http, we can use Http Filter API for intercepting requests from client and JS customisations in browser to intercept the response:
@WebFilter("/*")
public class NetworkSimulatingFilter implements Filter {
    @Override
    public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain)
            throws IOException, ServletException {
        // Simulate network delay
        Thread.sleep(500);

        // Randomly drop requests
        if (Math.random() < 0.1) { // 10% chance to drop
            ((HttpServletResponse) response).sendError(HttpServletResponse.SC_GATEWAY_TIMEOUT, "Simulated timeout");
            return;
        }

        // Proceed normally
        chain.doFilter(request, response);
    }
}

To intercept the Vaadin requests in browser, the following hack for fetch may work:

(function() {
  const originalFetch = window.fetch;

  window.fetch = function(...args) {
    console.log('Intercepted request:', args);

    // Simulate a network failure
    if (args[0].includes('some-endpoint')) {
      return Promise.reject(new Error('Simulated network error'));
    }

    return originalFetch.apply(this, args).then(response => {
      console.log('Intercepted response:', response);

      // Optionally, clone and modify the response
      if (response.url.includes('another-endpoint')) {
        return response.text().then(body => {
          const modifiedBody = body.replace('original-content', 'modified-content');
          return new Response(modifiedBody, response);
        });
      }

      return response;
    });
  };
})();

@github-project-automation github-project-automation bot moved this from ⚒️ In progress to Done in Vaadin Flow ongoing work (Vaadin 10+) Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

2 participants