Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

initial implementation of persistence [RFC/WIP] #63

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

rmoriz
Copy link
Contributor

@rmoriz rmoriz commented Apr 4, 2017

(This is currently a RFC which I just hacked together.)

Motivation:

In complex setups, when you have more than one front-facing httpd, like in anycast/cdn setups, you have the following problems with acme workflows:

  • initial bootstrap needs certificates (self-signed, no problem)
  • only one node should start the acme verification process
  • acme/LE needs to be directed to the right node every time
  • additional node bootstrapping and renewal needs a way to distribute the key/cert

I've seen many suggestions like using s3, syncthing, rsync… however I don't like the complexity and more single points of failure.

My implementation

  • uses chef's data bags, with or without encryption (strongly preferred)
  • needs a chef-server or chef-zero setup (knife-zero works really well)

Limitation

The master client needs the correct permission to create/upload data bags to the chef server. This may have security implications.

One must configure one and only one node which acts as master. This can be a simple node attribute, or some DSL method like tagged?. fail-over must be implemented manually or by automatically updating attributes on the chef-server by a 3rd party (e.g. monitoring).

Key/Cert on workers may lag one converge cycle, this can be reduced by triggering converges on workers with tools like chef's "push jobs".

Usage:

Chef

Example recipe code:

# action :load checks for an existing data bag item and renders the files
#
acme_persistence 'example.com' do
  master node['acme']['master']  # this should be true only on the master node
  data_bag_name 'acme-certificates'
  alt_names %w(www.example.com example.com)
  key       '/etc/ssl/private/example.com.key'
  fullchain '/etc/ssl/certs/example.com.crt'
  action :load

  notifies :reload, 'service[nginx]'
end

acme_selfsigned 'example.com' do
  key '/etc/ssl/private/example.com.key'
  crt '/etc/ssl/certs/example.com.crt'

  not_if  'test -f /etc/ssl/certs/example.com.key && test -f /etc/ssl/certs/example.com.crt'
end

(configure nginx)

acme_certificate 'example.com' do
  alt_names %w(www.example.com example.com)
  key      '/etc/ssl/private/example.com.key'
  fullchain '/etc/ssl/certs/example.com.crt'
  method   'http'
  wwwroot  '/var/www/acme'

  only_if { node['acme']['master'] }
  notifies :save,   'acme_persistence[example.com]', :immediately # this will trigger the data bag item upload
  notifies :reload, 'service[nginx]', :immediately
end

on the master node with node['acme']['master'] set to true

first run
  • no data bag item will be found
  • acme_selfsigned will create a selfsigned certificate
  • acme_certificate will do the acme work and trigger save to acme_persistence.
  • acme_persistence uploads data to server.
subsequent runs/ renew
  • data bag data found but will only be written if existing cert is self-signed or older
  • acme_certificate will check expiration, re-keying, and trigger save to acme_persistence
  • acme_persistence uploads data to server.

on workers, without node['acme']['master']

first run
  • acme_persistence does not find certificates in the data bag
  • acme_selfsigned will create dummies so the webserver can start.
subsequent runs/ renew
  • acme_persistence finds a data bag item and renders out the data and triggers a webserver reload (same logic as in the master)

Webserver

Workers must proxy acme verification requests to the .well-known URL to the master node. This can easily be scripted inside a virtual host template and triggered by the same source as above (e.g. a node attribute)

e.g.

  # /etc/nginx/conf.d/upstream.conf
  upstream acme_master {
    server unique-address.webserver.example.com:443;
  }
 # vhost-template.conf.erb
...

location ^~ /.well-known/acme-challenge/ {
  default_type "text/plain";

  <% if @acme_master %>
  root /var/www/acme;
  <% else %>
  proxy_pass https://acme_master;
  <% end %>
}

location = /.well-known/acme-challenge/ {
  return 404;
}
...

related: #15

@thoutenbos
Copy link
Collaborator

thoutenbos commented Apr 4, 2017

Nice one!

Would it make sense to be compatible with the certificates cookbook in data bag storage? Then this cookbook wouldn't even need to run on any non-master nodes.

@rmoriz
Copy link
Contributor Author

rmoriz commented Apr 4, 2017

renamed data bag item key "crt" to "cert" which should make it compatible if the combination of "key", "crt"/"cert" and "chain" is used.

@thoutenbos
Copy link
Collaborator

So this is definitely a great start, just thinking what we would need before it can be merged in. Of the top if my head I'm thinking:

  • Update the readme
  • add Kitchen integration tests
  • add ChefSpec matchers

@rmoriz rmoriz force-pushed the rmoriz/persistence branch from ca0ec40 to 7422d76 Compare April 14, 2017 11:16
@rmoriz
Copy link
Contributor Author

rmoriz commented Apr 14, 2017

@thoutenbos can you add a basic chefspec setup to the master branch? kitchen integration tests are IMHO worthless because they fail all the time and have such a big overhead.

@rmoriz
Copy link
Contributor Author

rmoriz commented Apr 14, 2017

… and we need chef13 compatibility.

@rmoriz rmoriz force-pushed the rmoriz/persistence branch from 7422d76 to fbb6023 Compare April 14, 2017 12:40
@rmoriz
Copy link
Contributor Author

rmoriz commented Apr 15, 2017

changes in SAN need to be covered, too. This is broken in the main part, too (see #35).

Reagarding SAN changes: IMHO this should be part of the core acme_certificate resource and not be part of the persistence. Persistence should restore whatever is possible, maybe even without using this cookbook. It's the job of the acme_certificate resource to renew and rerun the process on changes in SAN list (I guess only when additional/new SANs are requested).

@thoutenbos
Copy link
Collaborator

Now that we have Chef 13 support and the build fixes this should be able to go to master. Would you be able to add the README and integration tests to make it ready ?

Chef::Log.error("certificate file #{crt || fullchain} exists but is broken: #{e}")
end

item = load_data_bag_item(data_bag_name, 'id:' + cn, secret)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you don’t need this custom method, and can instead use the built-in data_bag_item() method, since you’re in a scope that has it.

Copy link
Contributor Author

@rmoriz rmoriz Mar 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My solution was supposed to support three cases:

  • no encrypted data bag -> empty secret
  • custom data bag secret -> manual set secret
  • default data bag secret -> manual set secret

using chef's data_bag_item without a secret automatically falls back to the secret defined in Chef::Config[:encrypted_data_bag_secret] in case of an encrypted data bag (no way to customize and iirc providing nil as secret throws an exception, too)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rmoriz ah! I forgot about the nil throwing an exception. In that case, this is great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants