Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Machine ID: SPIFFE support in tbot #37772

Merged

Conversation

strideynet
Copy link
Contributor

@strideynet strideynet commented Feb 5, 2024

Dependent on #38181

Part of #36205

As per RFD https://github.com/gravitational/teleport.e/blob/master/rfd/0018e-machine-id-workload-identity-mvp.md

Introduces a service type and output type for generating SPIFFE SVIDs. There's a few improvements I'd like to make in follow up PRs before I remove the feature flag for this.

Example config:

version: v2
auth_server: redacted:443
certificate_ttl: 24h
onboarding:
  join_method: token
  token: redacted
storage:
  type: directory
  path: /Users/noah/code/gravitational/teleport-scratch/workload-id/storage
# outputs will be filled in during the completion of an access guide.
outputs:
- type: spiffe-svid
  destination:
    type: directory
    path: /Users/noah/code/gravitational/teleport-scratch/workload-id/aws-roles-anywhere
  svid:
    path: /ra/my-role
- type: spiffe-svid
  destination:
    type: directory
    path: /Users/noah/code/gravitational/teleport-scratch/workload-id/output
  svid:
    path: /bart
    sans:
      ip:
       - 10.0.0.0
      dns:
       - google.com
       - example.com
services:
  - type: spiffe-workload-api
    listen: unix:///Users/noah/code/gravitational/workload-identity-experiment/workload.sock
    svids:
      - path: /bar
      - path: /foo

changelog: SPIFFE SVID generation introduced to tbot - experimental

@strideynet strideynet force-pushed the strideynet/spiffe-cert-issuance-tbot branch 2 times, most recently from 3635f46 to 8f9cf0d Compare February 6, 2024 10:48
@strideynet strideynet force-pushed the strideynet/spiffe-cert-issuance-tbot-spiffe-workload-api branch from 38afd0b to 00de8ff Compare February 6, 2024 10:48
@strideynet strideynet changed the base branch from strideynet/spiffe-cert-issuance-tbot to strideynet/spiffe-cert-issuance February 13, 2024 09:11
@strideynet strideynet changed the base branch from strideynet/spiffe-cert-issuance to strideynet/workload-identity-svc February 16, 2024 09:03
@strideynet strideynet changed the base branch from strideynet/workload-identity-svc to strideynet/spiffe-cert-issuance-tbot February 16, 2024 09:04
@strideynet strideynet force-pushed the strideynet/spiffe-cert-issuance-tbot-spiffe-workload-api branch from a1d6c5f to 6f6c26c Compare February 16, 2024 09:22
@strideynet strideynet changed the base branch from strideynet/spiffe-cert-issuance-tbot to strideynet/workload-identity-svc February 16, 2024 09:24
@strideynet strideynet force-pushed the strideynet/workload-identity-svc branch from cc86c7e to 81d1ff1 Compare February 16, 2024 17:06
@strideynet strideynet force-pushed the strideynet/spiffe-cert-issuance-tbot-spiffe-workload-api branch from b77216c to f31aac7 Compare February 16, 2024 17:09
@strideynet strideynet changed the title Machine ID: SPIFFE Workload API tbot Service Machine ID: SPIFFE support in tbot Feb 16, 2024
@strideynet strideynet force-pushed the strideynet/spiffe-cert-issuance-tbot-spiffe-workload-api branch from 6b56538 to 0d41a77 Compare February 21, 2024 19:41
Base automatically changed from strideynet/workload-identity-svc to master February 26, 2024 12:01
@strideynet strideynet force-pushed the strideynet/spiffe-cert-issuance-tbot-spiffe-workload-api branch from 0d41a77 to ba6bba1 Compare February 26, 2024 15:33
lib/tbot/config/output_spiffe_svid.go Show resolved Hide resolved
lib/tbot/config/output_spiffe_svid.go Outdated Show resolved Hide resolved
lib/tbot/config/output_spiffe_svid.go Outdated Show resolved Hide resolved
lib/tbot/config/output_spiffe_svid.go Show resolved Hide resolved
lib/tbot/config/service_spiffe_workload_api.go Outdated Show resolved Hide resolved
lib/tbot/service_spiffe_workload_api.go Outdated Show resolved Hide resolved
lib/tbot/service_spiffe_workload_api.go Outdated Show resolved Hide resolved
srvMetrics.StreamServerInterceptor(),
),
grpc.StatsHandler(otelgrpc.NewServerHandler()),
grpc.MaxConcurrentStreams(defaults.GRPCMaxConcurrentStreams),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flagging to double check whether we want to add any non-default KeepaliveParams here:

grpc.KeepaliveParams(keepalive.ServerParameters{
// Using an aggressive idle timeout here since this gRPC server
// currently only hosts the join service, which has no need for
// long-lived idle connections.
//
// The reason for introducing this is that teleport clients
// before #17685 is fixed will hold connections open
// indefinitely if they encounter an error during the joining
// process, and this seems like the best way for the server to
// forcibly close those connections.
//
// If another gRPC service is added here in the future, it
// should be alright to increase or remove this idle timeout as
// necessary once the client fix has been released and widely
// available for some time.
MaxConnectionIdle: 10 * time.Second,
}),

// Shutdown the server when the context is canceled
<-egCtx.Done()
s.log.Debug("Shutting down Workload API endpoint")
srv.Stop()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider whether we want the hard Stop vs GracefulStop here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I had GracefulStop originally but the long-lived nature of the Workload API streaming RPCs means that GracefulStop will just hang as these connections don't close.

Comment on lines +411 to +414
case <-time.After(s.botCfg.RenewalInterval):
s.log.Debug("Renewal interval reached, renewing SVIDs")
// Time to renew the certificate
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Elsewhere we wait for the channel associated with the rootReloadBroadcaster for deciding that we're undergoing a cert renewal, should we do the same here (or is this something else / I'm just confused)?

reloadCh, unsubscribe := s.rootReloadBroadcaster.subscribe()
defer unsubscribe()
for {
select {
case <-ctx.Done():
return nil
case <-reloadCh:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reloadCh is fulfilling the same role here. Essentially, we have another goroutine which watches rootReloadBroadcaster and then fetches the new CA bundle and then sends a message on reloadCh. This just allows us to do a little bit of work before letting the RPCs know and prevents all the RPCs having to complete that work individually.

lib/tbot/service_spiffe_workload_api.go Show resolved Hide resolved
Co-authored-by: Isaiah Becker-Mayer <[email protected]>
Copy link
Contributor

@timothyb89 timothyb89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking really good! I played around a bit with various role RBAC combinations and tried the socket out with spiffe-helper, all with no issues.

(Oddly spiffe-helper can only dump the first cert so it's not a great test if multiple are requested, ah well)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No real comments, just neat to see this test use the actual SPIFFE client on our socket 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Easiest way :D I've found my testing life has gotten 90% easier since focussing on trying to write less unit tests, and more high-level real world tests. We're very lucky in Machine ID that we're not restrained by trying to reproduce user input and doing things programatically is inline with the product.

Copy link
Contributor

@ryanclark ryanclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bot

@public-teleport-github-review-bot public-teleport-github-review-bot bot removed the request for review from r0mant March 6, 2024 09:48
@strideynet strideynet added this pull request to the merge queue Mar 6, 2024
Merged via the queue into master with commit 5a00a67 Mar 6, 2024
37 checks passed
@strideynet strideynet deleted the strideynet/spiffe-cert-issuance-tbot-spiffe-workload-api branch March 6, 2024 10:06
@public-teleport-github-review-bot

@strideynet See the table below for backport results.

Branch Result
branch/v15 Failed

strideynet added a commit that referenced this pull request Mar 6, 2024
* Experiment with issuing SVIDs from `tbot`

* Fix missing license headers

* Remove interceptors for now

* Add basic required grpc server interceptors

* Break out CA rotation handler

* Tidy up service structure and omit more cert attributes

* Various tidying and adding godoc comments

* Move dependency to main require block

* Fix tests and add test for spiffe-workload-api in config

* Add config tests for SPIFFESVIDOutput

* Add config tests for SPIFFEWorkloadAPIService

* Add otel spans

* Add e2e test for the spiffe workload api functionality

* Fix trust bundle fetching to use correct client

* Reuse outputs service to produce identity for spiffe workload service

* Add godocs for SPIFFESVIDOUTPUT

* Tidy up Render

* Add consts for pem types

* Improve logging

Co-authored-by: Isaiah Becker-Mayer <[email protected]>

---------

Co-authored-by: Isaiah Becker-Mayer <[email protected]>
github-merge-queue bot pushed a commit that referenced this pull request Mar 7, 2024
* Machine ID: SPIFFE support in `tbot` (#37772)

* Experiment with issuing SVIDs from `tbot`

* Fix missing license headers

* Remove interceptors for now

* Add basic required grpc server interceptors

* Break out CA rotation handler

* Tidy up service structure and omit more cert attributes

* Various tidying and adding godoc comments

* Move dependency to main require block

* Fix tests and add test for spiffe-workload-api in config

* Add config tests for SPIFFESVIDOutput

* Add config tests for SPIFFEWorkloadAPIService

* Add otel spans

* Add e2e test for the spiffe workload api functionality

* Fix trust bundle fetching to use correct client

* Reuse outputs service to produce identity for spiffe workload service

* Add godocs for SPIFFESVIDOUTPUT

* Tidy up Render

* Add consts for pem types

* Improve logging

Co-authored-by: Isaiah Becker-Mayer <[email protected]>

---------

Co-authored-by: Isaiah Becker-Mayer <[email protected]>

* Fix tests broken in backport

* Further fix tests broken in backport

---------

Co-authored-by: Isaiah Becker-Mayer <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants