Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow passing gRPC services over context bridge #38573

Closed
wants to merge 12 commits into from

Conversation

gzdunek
Copy link
Contributor

@gzdunek gzdunek commented Feb 23, 2024

The problem

The UI layer of Connect (a renderer process) communicates with the tsh daemon via gRPC client. This client requires uses Node.js APIs, so we can't use it directly in the renderer. Because of that, the client is created in preload.ts and then passed to the renderer through the electron context bridge.
The only problem is that we can't send JS classes via the bridge (the class prototype is dropped making it useless). It has been solved by sending plain JS objects as requests from the renderer, converting them to classes on the "preload" side and calling the RPCs.
However, there are some problems with it:

  • it requires transforming every request, so now have 1000+ lines of boilerplate in createClient.ts,
  • exposing non-unary RPCs is inconvenient - for example, the logic of passwordless login has to be kept in createClients.ts (but it shouldn't - the aim of this file is map one request form to another).

Alternative approach

Instead of converting every RPC, we can just convert the four handlers (for unary, client-streaming, server-streaming, duplex RPCs) and the service class itself. This can be done quite easily using interceptors (20b80b4).
Thanks to that, we can expose the gRPC service to the renderer and callsites can use it as a "regular" gRPC client, just like on the preload side (and we can remove the boilerplate).
I also converted the client to protobuf-ts instead of grpc-js, so callsites can use proper async APIs.
This is the alternative version of solving "Refactoring tshd client" from https://github.com/gravitational/teleport.e/issues/853.

Caveats

The only thing that callsites have to remember is not to pass any classes to requests, like AbortSignal. Instead, we should prepare a converter that they would map a regular signal to its object version. In the PR I changed the signature of TshAbortSignal to match AbortSignal.
I also added a check that was supposed to throw if an AbortSignal class is passed, but now I realized that instanceof AbortSignal is always false after crossing the bridge 😏, so this has to be changed.

Note: this PR is a raw POC to check if the idea works, it doesn't pass TS check (good that we no longer use webpack that wouldn't allow to run the app 🙂).

@gzdunek gzdunek requested a review from ravicious February 23, 2024 15:20
Copy link
Member

@ravicious ravicious left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mostly looked at web/packages/teleterm/src/services/tshd/grpcContextBridgeClient.ts. Are there other files that I should focus on?

];
}

export function objectifyClient<T>(client: ServiceInfo): T {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about the use of the generic only for the return value. Ideally it should be something like

objectifyClient<Client>(client: ClassClient<Client>): Client
// or, depending on what protobuf-ts allows here
objectifyClient<Client>(client: Client): Client

Otherwise the signature says that we convert ServiceInfo (which doesn't implement any specific client interface) into T, which is rather unsound.

I think we'll be able to improve the signature of this function once we clear up TS errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I improved the type slightly. But I'm not sure if it is even possible to have a ClassClient type, since we expect a class instance here, not a class itself.

gzdunek added 7 commits March 4, 2024 11:13
# Conflicts:
#	gen/proto/ts/teleport/accesslist/v1/accesslist_service_pb.grpc-client.ts
#	gen/proto/ts/teleport/accesslist/v1/accesslist_service_pb.grpc-server.ts
#	gen/proto/ts/teleport/lib/teleterm/v1/service_pb.grpc-client.ts
#	gen/proto/ts/teleport/lib/teleterm/v1/service_pb.grpc-server.ts
#	web/packages/teleterm/src/services/tshdEvents/index.ts
@gzdunek
Copy link
Contributor Author

gzdunek commented Mar 5, 2024

I reverted all changes to tshd events protos. Converting them should be done separately, although it may be more difficult because I don't see an easy way to intercept all calls (maybe because adaptService is an experimental API).

But we may also leave it as is - the API surface is quite small and we already have a layer that converts the requests before passing them through the context bridge.

Comment on lines +97 to +100
status: objectifyPromiseRejection(output.status),
trailers: objectifyPromiseRejection(output.trailers),
headers: objectifyPromiseRejection(output.headers),
response: objectifyPromiseRejection(output.response),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized that these four promises cause uncaughtException errors since we don't handle them.
But I don't know what I can do about it - I believe it would work the same without my interceptors 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where exactly do those errors show up?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can see them, for example, when you switch to the advanced search in the search bar, type 'abc' and press Enter. In the dev tools you will see four unhandled rejections (for status, trailers, headers and then). The only promise we are currently handling is response.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose in the real client those fields are all getters and maybe operate on the same underlying promise?

But also, if we exposed a leaner client with a smaller API (#38573 (comment)), perhaps we wouldn't need to worry about this case at all.

@gzdunek gzdunek requested a review from ravicious March 5, 2024 09:52
Copy link
Member

@ravicious ravicious left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran out of time today. I'll continue the review tomorrow.

Comment on lines +139 to +145
// getters allow reading the fresh value of class fields
get reason() {
return a.reason;
},
get aborted() {
return a.aborted;
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this not get removed when going over the context bridge? electron/electron#25516

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, it is converted to a regular field 🤦
I think the only solution to this is to subscribe to updates in objectifyAbortSignal, like that:

  a.addEventListener(
    'abort',
    () => {
      cloned.reason = a.reason;
      cloned.aborted = a.aborted;
    },
    { capture: true, once: true }

Comment on lines +57 to +63
abortController = new AbortController();

let rootClusters: tshd.Cluster[];
try {
rootClusters = await tshdClient.listRootClusters(abortController.signal);
rootClusters = await tshdClient.listRootClusters(
objectifyAbortSignal(abortController.signal)
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we keep client.createAbortController to make the callsite a bit more tidy and without the need to import extra functions? objectifyAbortSignal can still exist for cases where e.g. some code from a shared package passes signal along to teleterm code.

[abortController, abortSignal] = tshdClient.createAbortController()

rootClusters = await tshdClient.listRootClusters(abortSignal)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we have 7 callsites that need to import objectifyAbortSignal (and 5 of them use their own abort controller). Also, eventually we want to remove the tshdClient layer, so we will still need to import it from somewhere.
Because of that, I'm leaning towards importing it from a separate file.

@ravicious ravicious self-requested a review March 5, 2024 15:10
// If onRejected callback is provided, then it will handle the rejection.
if (onRejected) {
return originalThen(onFulfilled, reason =>
onRejected(objectifyError(reason))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reason is not guaranteed to be an error, so we should probably inspecwt reason first. I suppose just reason instanceof Error should suffice. If it's not an Error, then we can just pass it to onRejected without conversion.

Comment on lines +64 to +65
} catch (e) {
throw objectifyError(e);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as in objectifyThenRejection, e can be anything.

// we have to set correct "this" before passing it further.
then: objectifyThenRejection(output.then.bind(output)),
};
return { ...output, ...objectifiedUnaryCall } as UnaryCall;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't understand why we spread on object too, so I wrote a little helper test to verify whether the returned unary call is cloneable.

Cloneable test
diff --git a/web/packages/teleterm/src/services/tshd/grpcContextBridgeClient.test.ts b/web/packages/teleterm/src/services/tshd/grpcContextBridgeClient.test.ts
index 9dbe632f9d..f277d2060f 100644
--- a/web/packages/teleterm/src/services/tshd/grpcContextBridgeClient.test.ts
+++ b/web/packages/teleterm/src/services/tshd/grpcContextBridgeClient.test.ts
@@ -1,3 +1,6 @@
+/**
+ * @jest-environment node
+ */
 /**
  * Teleport
  * Copyright (C) 2023  Gravitational, Inc.
@@ -16,7 +19,17 @@
  * along with this program.  If not, see <http://www.gnu.org/licenses/>.
  */
 
-import { objectifyAbortSignal } from 'teleterm/services/tshd/grpcContextBridgeClient';
+import * as grpc from '@grpc/grpc-js';
+
+import * as apiService from 'gen-proto-ts/teleport/lib/teleterm/v1/service_pb.grpc-server';
+import { TerminalServiceClient } from 'gen-proto-ts/teleport/lib/teleterm/v1/service_pb.client';
+
+import { GrpcTransport } from '@protobuf-ts/grpc-transport';
+
+import {
+  getObjectifiedMethodsInterceptor,
+  objectifyAbortSignal,
+} from 'teleterm/services/tshd/grpcContextBridgeClient';
 
 test('objectified abort signal reads up-to-date signal.aborted', () => {
   const controller = new AbortController();
@@ -25,3 +38,75 @@ test('objectified abort signal reads up-to-date signal.aborted', () => {
   controller.abort();
   expect(objectifyAbortSignal(controller.signal).aborted).toBe(true);
 });
+
+test('unary response is cloneable', async () => {
+  const service: Partial<apiService.ITerminalService> = {
+    updateTshdEventsServerAddress: (call, callback) => {
+      callback(null, {});
+      // TODO: Add a separate test for when an unary response returns an error.
+      // callback(new Error("Whoops"), null);
+    },
+  };
+  const server = new grpc.Server();
+  server.addService(apiService.terminalServiceDefinition, service);
+  const port = await bindServer(server, 'localhost:0');
+  server.start();
+
+  // If port is less than zero than bind failed.
+  expect(port).toBeGreaterThan(0);
+
+  const client = new TerminalServiceClient(
+    new GrpcTransport({
+      host: `localhost:${port}`,
+      channelCredentials: grpc.credentials.createInsecure(),
+      interceptors: [getObjectifiedMethodsInterceptor()],
+    })
+  );
+
+  const unaryCall = client.updateTshdEventsServerAddress({ address: 'foo' });
+
+  for (const propertyName in unaryCall) {
+    let property = unaryCall[propertyName];
+
+    if (property['then']) {
+      property = await property;
+    }
+
+    // Functions cannot be passed to structuredClone, but they can be passed through the context
+    // bridge.
+    if (typeof property === 'function') {
+      console.log(`TODO: Skipping ${propertyName}, add a specific test for it`);
+      continue;
+    }
+
+    expect(() => {
+      try {
+        structuredClone(property);
+      } catch (err) {
+        throw new Error(
+          `Calling structuredClone on the '${propertyName}' property has thrown an error: ${err}`,
+          { cause: err }
+        );
+      }
+    }).not.toThrow();
+  }
+
+  expect(() => structuredClone(unaryCall)).not.toThrow();
+
+  await unaryCall;
+  server.forceShutdown();
+});
+
+const bindServer = (server: grpc.Server, address: string): Promise<number> =>
+  new Promise((resolve, reject) => {
+    server.bindAsync(
+      address,
+      grpc.ServerCredentials.createInsecure(),
+      (error, port) => {
+        if (error) {
+          reject(error);
+        }
+        resolve(port);
+      }
+    );
+  });

It calls structuredClone on each property of the unary call and then on the unary call itself. It ignores functions as those cannot be passed to structuredClone but can be passed through the context bridge.

In general I just wanted to determine how much of the returned unary call we have to change. The method property does not seem to be cloneable. But also it appears that spreading on output might not be necessary?

I was hoping this would give me a better understanding of what to do with those promises you mentioned in https://github.com/gravitational/teleport/pull/38573/files#r1512524830. But tbh it didn't get me much closer.

We probably need a separate test to verify that the errors get wrapped properly?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that's missing from me in this PR, I'm trying to understand how things work, but I don't know how to verify if they work. Is running Connect My Computer the intended way to do that?

I think a simpler scenario would have been better for testing, even if it involves temporarily changing the code of tshd in some way, e.g. a request that takes longer to complete so that we can abort it or something in that vein.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Furthermore, I'm not sure if we need to absolutely "objectify" everything and if interceptors are the way to address that.

Take a look at my comment about awaiting the response below. #38573 (comment) We could have the "objectified" client with a somewhat stricter API than the generic client provided by the library. One where maybe you cannot inspect method and requestHeaders, but one that makes the callsite use the correct way to await a response and that wraps the errors properly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't understand why we spread on object too, so I wrote a little helper test to verify whether the returned unary call is cloneable.

So the spread of output is actually not needed, I added it "just in case". Since we assign the objectified call to PublicProperties<UnaryCall> then will get a TS error if a new property is added to public API of UnaryCall.
I will remove it, since it is just confusing.

In general I just wanted to determine how much of the returned unary call we have to change. The method property does not seem to be cloneable.

I'm not sure if checking if structuredClone works is a correct way to test it. AFAIK context bridge just drops the prototype, because I see methods object passing the bridge:
image

We probably need a separate test to verify that the errors get wrapped properly?

Yeah, we need more tests for sure.

I guess that's missing from me in this PR, I'm trying to understand how things work, but I don't know how to verify if they work. Is running Connect My Computer the intended way to do that?

So this is how this works now:

  • The client is created on the preload side, objectified and passed through the bridge.
  • createTshdClient still exists, but it works as a compatibility layer - it is now constructed on the renderer side, not preload. Eventually we want to get rid of it.
  • connectMyComputer service is an example of how things will look like without createTshdClient layer.
  • All RPCs go through the new "objectified" client.
  • If you want to test aborting calls, you can run the file transfer, I tested it this way.

Take a look at my comment about awaiting the response below. #38573 (comment) We could have the "objectified" client with a somewhat stricter API than the generic client provided by the library. One where maybe you cannot inspect method and requestHeaders, but one that makes the callsite use the correct way to await a response and that wraps the errors properly.

Valid point. My initial idea was to expose the client 1:1, if possible. But OTOH, we don't use status or trailers promises, so maybe it is not worth to pass them through the bridge.
Theoretically, we could objectify only then property. It would solve the problem with uncaught rejections, as you noticed.

Copy link
Contributor Author

@gzdunek gzdunek Mar 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding exposing a leaner client, we could expose only the uncommented properties.
These are the ones we actually use to make calls.
The question is how to satisfy TS, probably with Partial<PublicProperties<X>>.

export function getClonedMethodsInterceptor(): RpcInterceptor {
  return {
    interceptUnary(next, method, input, options) {
      validateAbortSignalType(options.abort);

      const output = next(method, input, options);
      // UnaryCall is a class with a private field,
      // so we can't assign an object to it.
      // The workaround is satisfying only the public part of it
      // and then casting to UnaryCall.
      const clonedUnaryCall: PublicProperties<UnaryCall> = {
        // method: output.method,
        // request: output.request,
        // requestHeaders: output.requestHeaders,
        // status: clonePromiseRejection(output.status),
        // trailers: clonePromiseRejection(output.trailers),
        // headers: clonePromiseRejection(output.headers),
        // response: clonePromiseRejection(output.response),
        // then is a method on UnaryClass,
        // we have to set correct "this" before passing it further.
        then: cloneThenRejection(output.then.bind(output)),
      };
      return clonedUnaryCall as UnaryCall;
    },
    interceptClientStreaming(next, method, options) {
      validateAbortSignalType(options.abort);

      const output = next(method, options);
      const clonedClientStreamingCall: PublicProperties<ClientStreamingCall> = {
        // method: output.method,
        // requestHeaders: output.requestHeaders,
        // status: clonePromiseRejection(output.status),
        // trailers: clonePromiseRejection(output.trailers),
        // headers: clonePromiseRejection(output.headers),
        // response: clonePromiseRejection(output.response),
        requests: cloneRequests(output.requests),
        then: cloneThenRejection(output.then.bind(output)),
      };
      return clonedClientStreamingCall as ClientStreamingCall;
    },
    interceptServerStreaming(next, method, input, options) {
      validateAbortSignalType(options.abort);

      const output = next(method, input, options);
      const clonedServerStreamingCall: PublicProperties<ServerStreamingCall> = {
        // method: output.method,
        // request: output.request,
        // requestHeaders: output.requestHeaders,
        // status: clonePromiseRejection(output.status),
        // trailers: clonePromiseRejection(output.trailers),
        // headers: clonePromiseRejection(output.headers),
        responses: cloneResponses(output.responses),
        then: cloneThenRejection(output.then.bind(output)),
      };
      return clonedServerStreamingCall as ServerStreamingCall;
    },
    interceptDuplex(next, method, options) {
      validateAbortSignalType(options.abort);

      const output = next(method, options);
      const clonedDuplexStreamingCall: PublicProperties<DuplexStreamingCall> = {
        // method: output.method,
        // requestHeaders: output.requestHeaders,
        // status: clonePromiseRejection(output.status),
        // trailers: clonePromiseRejection(output.trailers),
        // headers: clonePromiseRejection(output.headers),
        requests: cloneRequests(output.requests),
        responses: cloneResponses(output.responses),
        then: cloneThenRejection(output.then.bind(output)),
      };
      return clonedDuplexStreamingCall as DuplexStreamingCall;
    },
  };
}

EDIT: added then to all types of calls.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A leaner client would not have to implemented completely through interceptors alone. It could work similar to objectifyClient, where you call a function on the Node side that produces in a safe, "cloneable" client with a smaller API.

The function could take each method on the original client and define a new function on an object. Each function would then properly await the response and return it, along with any other thing that we need.

Copy link
Contributor Author

@gzdunek gzdunek Mar 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm starting to feel that my plan of getting rid of createTshdClient and "objectifying" everything through the interceptors started requiring too much "magic" :/
Not being able to correctly specify types for input and output values is definitely not a good thing.

The function could take each method on the original client and define a new function on an object.

So effectively I would have to bring back createTshdClient to preload :D
But probably that's better, the end user should get a leaner API, not the API that has only a few properties working.
Also, the new createTshdClient could be much more concise, since ideally it would look like this:

function createTshdClient(terminalService: ITerminalServiceClient) {
  return {
    getServers: cloneUnary(terminalService.getServers),
  }
}

also we wouldn't need interceptors anymore, as we could do the transformation directly in createTshdClient:

      function cloneUnary<I extends object, O extends object>(
        s: (input: I, options?: RpcOptions) => UnaryCall<I, O>
      ): (input: I, options?: RpcOptions) => Pick<UnaryCall<I, O>, 'then'> {
        return (input, options) => {
          const original = s(input, options);
          return { then: cloneThenRejection(original.then) };
        };
      }

another benefit is that we could provide better types for RpcOptions so it would contain ClonedAbortSignal not the regular AbortSignal.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, exactly!

So effectively I would have to bring back createTshdClient to preload :D

That's not a bad thing at all. Ultimately I think the goal is to be able to retain error details and reduce the amount of boilerplate. Being able to just use abort signals without having to deal with withAbort is also a huge plus. Not having to do stuff in preload does not seem like a benefit by itself.

const { token } =
await this.terminalServiceClient.createConnectMyComputerNodeToken({
rootClusterUri: rootCluster.uri,
}).response;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Responses shouldn't be awaited like this. Look for

https://github.com/timostamm/protobuf-ts/blob/main/MANUAL.md#generic-rpc-clients

So a simple way to get the response message of a unary call would be:

let call = service.myMethod(foo);
let response = await call.response;

But there is a caveat: gRPC and gRPC web use response trailers to indicate server status. This means that it is possible that the server responds with a message and then sends an error status. If you do not check the status, you may be missing an error.

Response trailers are a very useful feature, but for simple unary calls, awaiting two promises is cumbersome. For this reason, the UnaryCall itself is awaitable, and will reject if an error status is received. Instead of awaiting call.response, you can simply await the call:

let {response} = await service.myMethod(foo);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right. I will change it.

@gzdunek
Copy link
Contributor Author

gzdunek commented Mar 8, 2024

I'm closing this, because we gave up on passing the gRPC client over the bridge. I will open a new PR where we will wrap each method separately.

@gzdunek gzdunek closed this Mar 8, 2024
@gzdunek gzdunek deleted the gzdunek/grpc-client-context-bridge branch July 22, 2024 10:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants