-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Machine ID: SPIFFE support in tbot
#37772
Machine ID: SPIFFE support in tbot
#37772
Conversation
3635f46
to
8f9cf0d
Compare
38afd0b
to
00de8ff
Compare
a1d6c5f
to
6f6c26c
Compare
cc86c7e
to
81d1ff1
Compare
b77216c
to
f31aac7
Compare
tbot
Servicetbot
6b56538
to
0d41a77
Compare
0d41a77
to
ba6bba1
Compare
…ert-issuance-tbot-spiffe-workload-api
…ert-issuance-tbot-spiffe-workload-api
…ert-issuance-tbot-spiffe-workload-api
srvMetrics.StreamServerInterceptor(), | ||
), | ||
grpc.StatsHandler(otelgrpc.NewServerHandler()), | ||
grpc.MaxConcurrentStreams(defaults.GRPCMaxConcurrentStreams), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flagging to double check whether we want to add any non-default KeepaliveParams here:
teleport/lib/service/service.go
Lines 5965 to 5981 in fe345d4
grpc.KeepaliveParams(keepalive.ServerParameters{ | |
// Using an aggressive idle timeout here since this gRPC server | |
// currently only hosts the join service, which has no need for | |
// long-lived idle connections. | |
// | |
// The reason for introducing this is that teleport clients | |
// before #17685 is fixed will hold connections open | |
// indefinitely if they encounter an error during the joining | |
// process, and this seems like the best way for the server to | |
// forcibly close those connections. | |
// | |
// If another gRPC service is added here in the future, it | |
// should be alright to increase or remove this idle timeout as | |
// necessary once the client fix has been released and widely | |
// available for some time. | |
MaxConnectionIdle: 10 * time.Second, | |
}), |
// Shutdown the server when the context is canceled | ||
<-egCtx.Done() | ||
s.log.Debug("Shutting down Workload API endpoint") | ||
srv.Stop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider whether we want the hard Stop
vs GracefulStop
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, I had GracefulStop originally but the long-lived nature of the Workload API streaming RPCs means that GracefulStop will just hang as these connections don't close.
case <-time.After(s.botCfg.RenewalInterval): | ||
s.log.Debug("Renewal interval reached, renewing SVIDs") | ||
// Time to renew the certificate | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Elsewhere we wait for the channel associated with the rootReloadBroadcaster
for deciding that we're undergoing a cert renewal, should we do the same here (or is this something else / I'm just confused)?
teleport/lib/tbot/service_spiffe_workload_api.go
Lines 273 to 279 in fe345d4
reloadCh, unsubscribe := s.rootReloadBroadcaster.subscribe() | |
defer unsubscribe() | |
for { | |
select { | |
case <-ctx.Done(): | |
return nil | |
case <-reloadCh: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reloadCh
is fulfilling the same role here. Essentially, we have another goroutine which watches rootReloadBroadcaster
and then fetches the new CA bundle and then sends a message on reloadCh
. This just allows us to do a little bit of work before letting the RPCs know and prevents all the RPCs having to complete that work individually.
Co-authored-by: Isaiah Becker-Mayer <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking really good! I played around a bit with various role RBAC combinations and tried the socket out with spiffe-helper
, all with no issues.
(Oddly spiffe-helper
can only dump the first cert so it's not a great test if multiple are requested, ah well)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No real comments, just neat to see this test use the actual SPIFFE client on our socket 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Easiest way :D I've found my testing life has gotten 90% easier since focussing on trying to write less unit tests, and more high-level real world tests. We're very lucky in Machine ID that we're not restrained by trying to reproduce user input and doing things programatically is inline with the product.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bot
@strideynet See the table below for backport results.
|
* Experiment with issuing SVIDs from `tbot` * Fix missing license headers * Remove interceptors for now * Add basic required grpc server interceptors * Break out CA rotation handler * Tidy up service structure and omit more cert attributes * Various tidying and adding godoc comments * Move dependency to main require block * Fix tests and add test for spiffe-workload-api in config * Add config tests for SPIFFESVIDOutput * Add config tests for SPIFFEWorkloadAPIService * Add otel spans * Add e2e test for the spiffe workload api functionality * Fix trust bundle fetching to use correct client * Reuse outputs service to produce identity for spiffe workload service * Add godocs for SPIFFESVIDOUTPUT * Tidy up Render * Add consts for pem types * Improve logging Co-authored-by: Isaiah Becker-Mayer <[email protected]> --------- Co-authored-by: Isaiah Becker-Mayer <[email protected]>
* Machine ID: SPIFFE support in `tbot` (#37772) * Experiment with issuing SVIDs from `tbot` * Fix missing license headers * Remove interceptors for now * Add basic required grpc server interceptors * Break out CA rotation handler * Tidy up service structure and omit more cert attributes * Various tidying and adding godoc comments * Move dependency to main require block * Fix tests and add test for spiffe-workload-api in config * Add config tests for SPIFFESVIDOutput * Add config tests for SPIFFEWorkloadAPIService * Add otel spans * Add e2e test for the spiffe workload api functionality * Fix trust bundle fetching to use correct client * Reuse outputs service to produce identity for spiffe workload service * Add godocs for SPIFFESVIDOUTPUT * Tidy up Render * Add consts for pem types * Improve logging Co-authored-by: Isaiah Becker-Mayer <[email protected]> --------- Co-authored-by: Isaiah Becker-Mayer <[email protected]> * Fix tests broken in backport * Further fix tests broken in backport --------- Co-authored-by: Isaiah Becker-Mayer <[email protected]>
Dependent on #38181
Part of #36205
As per RFD https://github.com/gravitational/teleport.e/blob/master/rfd/0018e-machine-id-workload-identity-mvp.md
Introduces a service type and output type for generating SPIFFE SVIDs. There's a few improvements I'd like to make in follow up PRs before I remove the feature flag for this.
Example config:
changelog: SPIFFE SVID generation introduced to
tbot
- experimental