Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subdomain routing mode does not work properly with TLS certificates #244

Open
ghislainbourgeois opened this issue Sep 14, 2023 · 11 comments

Comments

@ghislainbourgeois
Copy link
Contributor

Bug Description

When using Traefik in the subdomain routing mode, all Ingress will use Traefik's default certificate.

CertHandler manages a single certificate for Traefik, and uses the external_hostname as the subject and sans. New Ingresses generated hostnames will not be represented in the certificate, and the SNI matching will not work, thus serving the default certificate.

I see 3 ways of fixing this issue:

  1. Modify CertHandler to support multiple certificates, and when set to the subdomain routing mode, request new certificates for each Ingress.
  2. Drop CertHandler and use the tls library directly to request new certificates for each Ingress when set to the subdomain routing mode.
  3. Keep CertHandler as is and add f"*.{external_hostname}" to the list of sans to request a wildcard certificate. This would have negative security implications however and I do not think it should be implemented; I am only mentioning it for completeness' sake.

To Reproduce

  1. juju add-model potato
  2. juju deploy traefik-k8s --trust --channel edge --config routing_mode=subdomain --config external_hostname=potato.com
  3. juju deploy zinc-k8s --trust --channel edge
  4. juju deploy self-signed-certificates --trust --channel edge
  5. juju relate self-signed-certificates:certificates traefik-k8s:certificate
  6. juju relate zinc-k8s:ingress traefik-k8s:ingress
  7. https://potato-zinc-k8s.potato.com # this will use Traefik's default certificate

Environment

This was tested with every charm from the edge channel, running locally on MicroK8s v1.27.5 revision 5892 with Juju 3.1.5-genericlinux-amd64.

Relevant log output

There are not relevant logs.

Additional context

No response

@sed-i
Copy link
Contributor

sed-i commented Sep 20, 2023

On traefik side, we have a matching config in place,

"tls": {
"domains": [
{
"main": self.external_host,
"sans": [f"*.{self.external_host}"],
},
],

but we do not have a matching cert for that, as we always generate a CSR with traefik's external hostname (config option) or IP address (metallb).

Yes, the CertHandler approach was designed with only path routing in mind.

request new certificates for each Ingress when set to the subdomain routing mode

Seems like the correct approach, but relation count would have to be artificially limited by the charm because the number of SANs per cert is limited.

@simskij
Copy link
Member

simskij commented Sep 26, 2023

Seems like the correct approach, but relation count would have to be artificially limited by the charm because the number of SANs per cert is limited.

This would only be an issue if we keep appending SANs to the same certificate. If we instead create one CSR per known subdomain so they all have individual certs, it won't be a concern. That approach, however, potentially opens up for rate-limiting issues with some public CAs like Let's Encrypt (50 certs per registered domain per week)

@sed-i
Copy link
Contributor

sed-i commented Oct 12, 2023

potentially opens up for rate-limiting issues with some public CAs like Let's Encrypt (50 certs per registered domain per week)

The rate limit is a similar concern for both approaches:

  • one cert per subdomain: every relation joined we send a new CSR and every departed we send a revocation request (?)
  • one cert for all subdomains: every relation joined/departed we send an updated CSR

BTW the max number of SANs per cert depends on the implementation:

No upper bound is defined; implementations are free to choose an upper bound that suits their environment.
-- (rfc via SO)

@sed-i
Copy link
Contributor

sed-i commented Oct 13, 2023

# Count SANs with:
echo | openssl s_client -showcerts -connect google.com:443 | openssl x509 -noout -text | grep DNS | tr ',' '\n' | wc -l
  • Google has 134 subdomains in one cert (and they also use wildcard SANs)
  • Wikipedia has 41 subdomains in one cert (and they also use wildcard SANs)
  • Wordpress and blogger have wildcard SAN.
  • StackExchange projects have wildcard SAN.
  • Government and bank websites have a separate cert per subdomain (and they do not use wildcard SANs)
  • Let's Encrypt limits to 100 SANs per cert (ref)

One cert for all subdomains seems simpler but more restricting:

Different cert per subdomain One cert for all subdomains
Count limit Unlimited (subject to CA rate limit and juju relation count limit) Depends on CA impl'd limit for SANs per cert
Num CSRs New CSR per add subdomain; revocation request per removed subd. New CSR for every added/removed subdomain
Config changed from path to subdomain Delete 1 cert; iterate N relations to create N CSRs -> N certs Create new CSR; overwrite the one cert
Config changed from subdomain to path Delete N certs; created one CSR Create new CSR; overwrite the one cert
Remote unit scale up/down Create/revoke csr; add/delete cert (for IPU, could try to always place all units of the same app in the same cert but if num units > 100 we hit CA count limit) Create new CSR; overwrite the one cert
Future support for custom path/subdomain No issue No issue
Future support for wildcard certs Would simplify everything, with LOTS of code change Would simplify everything, with little code change

Thoughts?

@sed-i
Copy link
Contributor

sed-i commented Oct 13, 2023

Another though:

Currently, traefik treats ingress per app and per unit very similarly for subdomain routing:

juju run trfk/0 show-proxied-endpoints --format json | jq -r '."trfk/0".results."proxied-endpoints"' | jq
{
  "prom/0": {
    "url": "http://sbds-prom-0.cluster.local/"
  },
  "am": {
    "url": "http://sbds-am.cluster.local/"
  }
}

If we change things up a bit:

{
  "prom/0": {
    "url": "http://unit-0.prom.sbds.cluster.local/"
  },
  "am": {
    "url": "http://am.sbds.cluster.local/"
  }
}

Then:

  • ingress per app could have one cert per app
  • ingress per unit could have one cert per app, using a wildcard SAN, *.prom.sbds.cluster.local

Wdyt @simskij @ghislainbourgeois @PietroPasotti @mthaddon?

@gregory-schiano
Copy link

FYI I'm facing this issue on the IAM bundle deployment, we need a TLS certificate for the ingress publicly exposing hydra, kratos, and login-ui but the CSR only covers the hostname.
This is blocking since I've no way to provide a certificate that'll work for the relations using subdomain routing

@gregory-schiano
Copy link

gregory-schiano commented Oct 16, 2023

# Count SANs with:
echo | openssl s_client -showcerts -connect google.com:443 | openssl x509 -noout -text | grep DNS | tr ',' '\n' | wc -l
  • Google has 134 subdomains in one cert (and they also use wildcard SANs)
  • Wikipedia has 41 subdomains in one cert (and they also use wildcard SANs)
  • Wordpress and blogger have wildcard SAN.
  • StackExchange projects have wildcard SAN.
  • Government and bank websites have a separate cert per subdomain (and they do not use wildcard SANs)
  • Let's Encrypt limits to 100 SANs per cert (ref)

One cert for all subdomains seems simpler but more restricting:

Different cert per subdomain One cert for all subdomains
Count limit Unlimited (subject to CA rate limit and juju relation count limit) Depends on CA impl'd limit for SANs per cert
Num CSRs New CSR per add subdomain; revocation request per removed subd. New CSR for every added/removed subdomain
Config changed from path to subdomain Delete 1 cert; iterate N relations to create N CSRs -> N certs Create new CSR; overwrite the one cert
Config changed from subdomain to path Delete N certs; created one CSR Create new CSR; overwrite the one cert
Remote unit scale up/down Create/revoke csr; add/delete cert (for IPU, could try to always place all units of the same app in the same cert but if num units > 100 we hit CA count limit) Create new CSR; overwrite the one cert
Future support for custom path/subdomain No issue No issue
Future support for wildcard certs Would simplify everything, with LOTS of code change Would simplify everything, with little code change
Thoughts?

I would be more in favor of different cert per subdomain or wildcard certificate. One cert for all domain can be tricky if the user chooses the "tls-certificates-operator" that allow providing certificates manually. Any relation added/removed will require the re-issue of the certificate manually.

@sed-i
Copy link
Contributor

sed-i commented Oct 16, 2023

Diagram summary of what's currently under consideration:
tls

@gregory-schiano
Copy link

Small comment about using wildcard certificates. Wildcard certificates can't be validated using HTTP challenge in ACME protocol. We don't have any provider charm that uses HTTP challenge, but this could have consequences

@sed-i
Copy link
Contributor

sed-i commented Oct 18, 2023

Per discussion:

  • We only allow strict subdomain routing (no wildcards)
  • Need to implement A1.

@sed-i sed-i mentioned this issue Oct 19, 2023
5 tasks
@dstathis dstathis added Checked and removed Checked labels Nov 12, 2024
@shipperizer
Copy link

is there any update on this? @sed-i

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants