-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFD 0179: Web API backward compatibility guidelines #44668
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My opinion is that incrementing a version number would be a perfectly good solution for some one-off breaking changes. There is definitely a big appeal in converting to Connect RPC, I can't deny, and my inner self striving for nice, complete solutions, is all in favor. But we need to do it for valid reasons, and openly acknowledge it.
As a solution for the problem of manual bookkeeping of the client-side and server-side data structure definitions, this is definitely something that I would support. Hell, this really triggered me sometimes. However, I don't think I've seen enough evidence that this actually helps to solve the backwards compatibility problem. In my eyes, we would still need to use similar techniques to tell old from new; only now, we would get some guardrails in form of both "new" and "old" being defined as a part of the same type-safe structure. Which we could achieve anyway with a bit of manual work, without involving Connect RPC.
Perhaps I don't see it? Maybe it would be easier to understand if you elaborated exactly how leveraging Connect RPC would help in the case that you mentioned?
rfd/0179-api-versioning.md
Outdated
### Creating a new rpc service with Connect | ||
(buf's Connectrpc, not Teleport Connect) | ||
|
||
Connect is a library used in making gRPC compatible HTTP apis. It supports [multiple protocols](https://connectrpc.com/docs/introduction#seamless-multi-protocol-support) and can generate go service code and typescript client code from the proto schema. We do something similar in Teleport Connect where we generate a client for `tsh` to make requests to the server and use the generated typescript code. This would be doing the same for the web client now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Support for multiple protocols sounds great, but can we disable unused ones to narrow down attack vectors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes you can select your transport.
One more thing: if we decide to use Connect RPC, can we make sure that we are prepared for debugging the network traffic? (I've seen solutions that make it really difficult to see what's happening between the browser and the server, so it's important for this to be either human-readable or have a good tooling support.) |
yeah, this is human readable still https://connectrpc.com/docs/go/getting-started#make-requests |
thanks for the feedback. at the end of the day, anyone can break anything if they try hard enough :), so hopefully the guidelines will keep us all cognizant of it when deving/reviewing. |
The PR changelog entry failed validation: Changelog entry not found in the PR body. Please add a "no-changelog" label to the PR, or changelog lines starting with |
Ready for review |
The PR changelog entry failed validation: Changelog entry not found in the PR body. Please add a "no-changelog" label to the PR, or changelog lines starting with |
@ravicious @gzdunek i know this RFD is for the web UI, but I'd still appreciate any input. Especially on the optional use of tanstack query |
rfd/0179-api-versioning.md
Outdated
|
||
|
||
### Adding/removing a field to a response of an existing endpoint | ||
Adding a new field to a response is OK as long as that field has to effect on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a new field to a response is OK as long as that field has to effect on | |
Adding a new field to a response is OK as long as that field has no effect on |
rfd/0179-api-versioning.md
Outdated
}, nil | ||
} | ||
``` | ||
If using ConnectRPC (a proposed solution later in this RFD), then we must _not_ remove fields from requests/responses, even if unused. Mark as deprecated and move on. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean that marking it reserved
won't work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reserved will work. I should change the language to "follow the deprecation process of proto messages and move on". i will update and link. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll have a couple of more thoughts on this that I'll post later today.
rfd/0179-api-versioning.md
Outdated
|
||
## Web Client/API Backward Compatibility Guidelines | ||
|
||
In general, if the Request or Response shape does not change, logic in an API handler can change without worry of backward compatibility. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What exactly do you mean here? I think I can see it, but taken at face value this sentence doesn't seem to be true. ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The shape of request or response doesn't change so the API contract is still fulfilled. This means it wouldn't lead to a "breaking" change
I guess a contrived example would be changing to return value in a response like {"hello": "world"} to {"hello": ", world!"} wouldn't be breaking.
But after typing it out, perhaps some logic is done by that field in the web client so. I'm not sure now actually. I'll think about it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The shape of request or response doesn't change so the API contract is still fulfilled. This means it wouldn't lead to a "breaking" change
I agree that it's not true.
For example, if an API returns status: "SUCCESS"
, and then we change it to status: "OK"
it is a breaking change.
I think we should say instead that meaning of the fields should not change? Like in https://www.mnot.net/blog/2012/12/04/api-evolution.html (paragraph Make Changes Backwards-Compatible).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I'm convinced. I will remove this because if i said "if all the fields are the same and all the values are the same, its ok!" but that seems quite obvious. anything else, can fall onto the guidelines themsleves.
rfd/0179-api-versioning.md
Outdated
type GetFoosResponse struct { | ||
Foos []string | ||
NextKey string | ||
// TODO (avatus) DELETE IN 18 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be the most popular format of a TODO note used in the project. Leaving a quick note is should be helpful in most cases as well.
// TODO (avatus) DELETE IN 18 | |
// TODO(avatus): DELETE IN 18. Use `DuplicateCount` instead. |
rfd/0179-api-versioning.md
Outdated
foos := []string{"foo1", "foo2", "foo3", "foo3"} | ||
return GetFoosResponse{ | ||
Foos: foos, | ||
// somedev: "we must keep ContainsDuplicates populated to support older clients!" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// somedev: "we must keep ContainsDuplicates populated to support older clients!" | |
// ContainsDuplicates must be populated to support older clients. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i didn't see this change 😅
rfd/0179-api-versioning.md
Outdated
If a feature requires a new endpoint, it must properly handle the case of a 404. | ||
This means making the "expected" 404 error clear to the user that an endpoint is | ||
not found due to a version mismatch (and preferably which minimum version is | ||
needed) rather than an ambiguous "Not Found" error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you give an example here? I think there are legitimate cases where an HTTP endpoint can return 404 to mean "I couldn't find the resource you were looking for" rather than "I don't know what this endpoint is". In our HTTP API, those two are conflated. In gRPC, there's a separate UNIMPLEMENTED status code.
Though tbh, I can't remember if I've ever seen a web API that was returning 400 or 501 instead of 404 when it encountered an unknown route. Especially the latter I think most of the time is reserved specifically for unsupported request methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I think in general the sentiment stands but perhaps I can word it differently. I would say in most cases a List returning a 404 would probably indicate a missing endpoint rather than a single resource (I don't think there is much we can do here besides still returning some shape?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I guess this would be up to the discretion of the dev at time of creation. Definitely one to just keep an eye out for (not a strict requirement but a guideline)
A good example of this was Pinned Resources. We would check if the endpoint existed and if not, display a message informing users to upgrade to 14.1 or something like that. I could have swore I had that example in here but I will add that example for clarity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaf servers also play a big role in this one (and the previous comment about rolling upgrades too)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added the distinction between "list" endpoints and single resource endpoints returning 404. I believe its going to be "Case-by-case" but the guidelines should be "best effort to handle gracefully"
rfd/0179-api-versioning.md
Outdated
|
||
If a response shape must change entirely, prefer creating a new endpoint/RPC. | ||
|
||
### Updating Request body fields |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
### Updating Request body fields | |
### Updating request body fields |
rfd/0179-api-versioning.md
Outdated
### Updating request fields in the web client | ||
Updating what fields are sent in a request body is ok as long as the API follows the updated request fields guidelines. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this section different from "Updating Request body fields"? Is it about changing what fields a client sends vs "Updating Request body fields" being about which fields the server expects?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah basically just the direction of the communication. I have them separated since I went with "server changes" and "web client changes". I think that you were able to assume the correct answer that it makes enough sense but I will reword it to make it more clear regardless (or note or something). Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not as smart as @ravicious and without his comment about it, i wouldn't have understood 🥲
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the feedback. I will update! 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed this second paragraph and slightly changed the wording on the other to include both sides of the change.
rfd/0179-api-versioning.md
Outdated
fetching framework to simplify and unify the way we make api requests in the web | ||
client. This will remove boilerplate that we need to use our APIs in the web | ||
client, provide type safety out of the box for both server and client code, and | ||
provide backward compatibility guardrails when creating/updating our APIs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I totally agree with @bl-nero's comment above about bumping a version number being good enough for one-off breaking changes and the biggest advantage of Connect RPC being not forwards/backwards compatibility, but autogenerated type-checked structures in both JS and Go.
As far as compatibility is concerned, I feel like Connect RPC itself doesn't provide anything which we couldn't replicate with something like, idk, zod on the JS side. The difference is that with zod we would need to replicate the same structures in JS and then in Go.
I don't think we'll find anyone who, after reading this RFD, would be able to conclude with "This is a great idea that will absolutely work and be worth the time, let's do it". Perhaps it would be worthwhile to do a spike with an example implementation to see:
- How easy it'll be to integrate this along the current web API.
- How Connect RPC would address common pain points or past problems compared to the existing solution.
Another idea to toy with which could turn out to be an argument in favor of RPCs: how many web API endpoints do actual work and how many of them merely send off data to the auth service? Given the unusual compatibility guarantees put upon the Web UI (a client that needs to work with older servers), how feasible would it be to reuse protos from the /api
directory to streamline endpoints which merely pass data to the auth service?
By the last question, I mean that when I want to add a new field to a usage event in Connect, which passes from the Electron app through tshd to prehog, it's a matter of updating a single proto file and the change trickles down to every place that uses it. Is this something achievable in the Web UI as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difference is that with zod we would need to replicate the same structures in JS and then in Go.
This is the crux of the problem. At the end of the day, if we rely on creating code on the backend and the frontend, we are bound to just end up in the same place we are. Removing the need to create code in two places and removing boilerplate seems worthwhile. A spike is probably a great idea (more than I spike i did for implementing a test feature, more like spike an actual feature instead).
how feasible would it be to reuse protos from the /api directory to streamline endpoints which merely pass data to the auth service?
Probably pretty feasible. Although, from my experience, we send some fields that arent passed to auth (and are used for creating the auth message generally) but its the return trip where things are quite different. Almost every resource-based web endpoint receives data from auth an has to convert it to a UI structure so reusing protos from proxy<->auth doesn't really seem to work.
I think the benefit that Connect gives us is the type generation is a bit more modern in that it reads closer to how types would be generated by a human, and also (if we went the tanstack query route), has built in generators for client queries as well. A lot it the type system could be solved without it but from my initial testing, this was the best DX imo.
I agree with you and @bl-nero that solving the backward compatibility problem is as simple as appending the version prefix I talked about in the RFD. So I +1 the idea of "add the version prefix to our apis and follow the guidelines" is good enough, and will be happy if thats the only result of this RFD.
If we want to discuss the type system in a greater context, we can move it to its own RFD.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to discuss the type system in a greater context, we can move it to its own RFD.
+1
rfd/0179-api-versioning.md
Outdated
### Removing an endpoint | ||
An endpoint can be removed after our backward compatibility promise has been | ||
fulfilled and NOT before. This means marking for deletion with a TODO and | ||
removing it in the relevant version. For example, if v17 no longer uses |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should be specific on the minor b/c if changes are introduced 17.[minor > 0], then we have to +2 version delete (maybe that's worthwhile mentioning it here? since this has been something i always get confused on)
removing it in the relevant version. For example, if v17 no longer uses | |
removing it in the relevant version. For example, if v17.0 no longer uses |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think minors count? As far as I can tell, we only worry about major versions when it comes to declaring incompatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the scenario i was trying to describe: if something got introduced v17.1
, then you have to mark for deletion in v19
(not v18
), b/c if we delete in v18, then we break compatibility with v17.0 -> v18.0
am i misunderstanding? i do get confused by this all the time so maybe i am 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see what you are saying. Then yes YOU are correct and I will update the example!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh btw, i think we need to include the patch
too... probably unlikely to happen but it could happen, so:
- new feature introduce on major version, mark deletion for one major version ahead
- new feature introduce on minor/patch version, mark deletion for two major version ahead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, yeah, it didn't occur to me, but yeah, it's something we do have to account for.
Perhaps we could explain the general rule and then show examples, rather than adding one rule and then being like "oh in case a feature is added in a minor or patch version, there's a slightly different rule". I was thinking of something like this:
An endpoint can be removed in a major version n+2, where n is the last major version where the endpoint was used.
Example 1: v17 no longer uses
GET /webapi/foos
which was last used in v16. The endpoint can be removed in v18.
Example 2: v4.2.0 no longer usesGET /webapi/bars
which was last used in v4.1.3. The endpoint can be removed in v6, so that v5 still supports clients using v4.1.3 and v4.2.0.
rfd/0179-api-versioning.md
Outdated
fetching framework to simplify and unify the way we make api requests in the web | ||
client. This will remove boilerplate that we need to use our APIs in the web | ||
client, provide type safety out of the box for both server and client code, and | ||
provide backward compatibility guardrails when creating/updating our APIs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to discuss the type system in a greater context, we can move it to its own RFD.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed only the versioning part and that LGTM.
The api versioning guideline and the idea of backward compatibility itself is good, though it feels unfortunate that we have to deal with this situation of proxy running with different version in a same cluster in the first place. Alternatively, can we make changes in the proxy so it would always forward incoming request to the proxy which has the latest API version? That way we wouldn't need to worry about internal req/resp fields as long as the request hits correct proxy.
The idea of using Connect RPC sounds good to me and I see single type system as a win. Maybe worth it to move it to a different RFD as suggested in existing review comments so we have more discussion on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments.
In general, I agree with others that ConnectRPC is a really good improvement but doesn't change much when it comes to API compatibility (so we can discuss it separately).
rfd/0179-api-versioning.md
Outdated
|
||
## Web Client/API Backward Compatibility Guidelines | ||
|
||
In general, if the Request or Response shape does not change, logic in an API handler can change without worry of backward compatibility. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The shape of request or response doesn't change so the API contract is still fulfilled. This means it wouldn't lead to a "breaking" change
I agree that it's not true.
For example, if an API returns status: "SUCCESS"
, and then we change it to status: "OK"
it is a breaking change.
I think we should say instead that meaning of the fields should not change? Like in https://www.mnot.net/blog/2012/12/04/api-evolution.html (paragraph Make Changes Backwards-Compatible).
Thanks for all the feedback team. I've removed the talk about adding Connect (im still excited on this so I will open another RFD soon). Also, I believe I've handled all the feedback. Please take another look 🙏 🙌 |
Any lingering feedback on this? Friendly ping to reviewers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay, I completely missed that you updated the RFD last week :(
rfd/0179-api-versioning.md
Outdated
### New required fields from an API response in the web client | ||
If the updated feature _cannot_ function without receiving new fields from the | ||
API (for example, receiving a response from a proxy version N-1), refer to the | ||
API guidelines about creating a new endpoint. If the feature itself is degraded |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: the guideline you are referring to is about creating an endpoint that didn't exist before, not about the version increase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, i'll leave approving to @kiosion since things are new for him and can request any clarification
rfd/0179-api-versioning.md
Outdated
// keep this | ||
h.GET("/webapi/tokens", h.WithAuth(h.getTokens)) | ||
|
||
// and add this when a new version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think
// and add this when a new version | |
// and add this with a new version |
rfd/0179-api-versioning.md
Outdated
Example 1: v17 no longer uses GET /webapi/foos which was last used in v16. The | ||
endpoint can be removed in v18. | ||
Example 2: v4.2.0 no longer uses GET /webapi/bars which was last used in v4.1.3. The endpoint can be removed in v6, | ||
so that v5 still supports clients using v4.1.3 and v4.2.0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it wasn't separated (run on)
Example 1: v17 no longer uses GET /webapi/foos which was last used in v16. The | |
endpoint can be removed in v18. | |
Example 2: v4.2.0 no longer uses GET /webapi/bars which was last used in v4.1.3. The endpoint can be removed in v6, | |
so that v5 still supports clients using v4.1.3 and v4.2.0. | |
Example 1: v17 no longer uses GET /webapi/foos which was last used in v16. The | |
endpoint can be removed in v18. | |
Example 2: v4.2.0 no longer uses GET /webapi/bars which was last used in v4.1.3. The endpoint can be removed in v6, so that v5 still supports clients using v4.1.3 and v4.2.0. |
rfd/0179-api-versioning.md
Outdated
foos := []string{"foo1", "foo2", "foo3", "foo3"} | ||
return GetFoosResponse{ | ||
Foos: foos, | ||
// somedev: "we must keep ContainsDuplicates populated to support older clients!" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i didn't see this change 😅
rfd/0179-api-versioning.md
Outdated
API (for example, receiving a response from a proxy version N-1), refer to the | ||
API guidelines about creating a new versioned endpoint. If the feature itself is degraded |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
API (for example, receiving a response from a proxy version N-1), refer to the | |
API guidelines about creating a new versioned endpoint. If the feature itself is degraded | |
API (for example, receiving a response from a proxy version N-1), refer to the | |
API guidelines about [creating a new versioned endpoint](#creating-a-new-endpoint). If the feature itself is degraded |
rfd/0179-api-versioning.md
Outdated
authors: Michael Myers ([email protected]) | ||
state: draft | ||
--- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
think you need the extra hyphens on top to be formatted correctly
authors: Michael Myers ([email protected]) | |
state: draft | |
--- | |
--- | |
authors: Michael Myers ([email protected]) | |
state: draft | |
--- |
### Removing an endpoint | ||
An endpoint can be removed in a major version n+2, where n is the last major | ||
version where the endpoint was used. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
think its good add a delete example:
Mark endpoints that needs to be removed with: | |
```go | |
// TODO(<your-github-handle>): DELETE IN 18.0 | |
h.GET("/webapi/tokens", h.WithAuth(h.getTokens)) | |
``` | |
rfd/0179-api-versioning.md
Outdated
type GetFoosResponse struct { | ||
Foos []string | ||
NextKey string | ||
// TODO(avatus): DELETE IN 18. Use `DuplicateCount` instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be helpful to add a // Deprecated: <Reason>
line as well as the TODO prior to its removal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rest looks like minor comments, approving
9559def
to
a8276b8
Compare
rendered
This RFD includes the API guidelines for keeping our web clients/api backward compatible