AWS Docker Swarm deployment over Spot Fleet made easy! Everything is deployed via Ansible.
Services include:
- Traefik Reverse Proxy + LetsEncrypt
- Swarm Monitoring via Prometheus and Grafana
- Example Frontend With Login Page
- Example Backend API with FastAPI
- Jenkins Deployment + Jenkins file example that automatically deploys successfully tested services
- Internal Auth service for API tracking and frontend login/registration
Exposed Service | URL |
---|---|
Webservice | example.com |
FastAPI | api.example.com |
Monitoring Dashboard | monitoring.example.com |
Jenkins | jenkins.example.com |
Traefik Dashboard | traefik.example.com |
This project is the remnants of a financial services API I was working on some time ago (theoperator.io). The idea was very similar to that of alphavantage.co. theoperator.io has been abandoned for some time but I've recently started working on a new project and decided to not let the work here go to waste.
- Create an AWS access policy (See below for example)
- Create an AWS EC2 Security Group (See below for example)
- Create 3 Elastic IPs in AWS (1 for leader and 2 for managers)
- Add the Leader Elastic IP as an A DNS entry in Route 53 hosted zone
- Spin up 3 instances
- Associate the IPs with the instances
- Ensure you can ssh into each machine with your .pem
- Add your .pem to ansible/
- Add IPs to ansible/hosts
- Add relevant info to ansible/group_vars/all
- Edit .env as required
- Run
ansible-playbook ansible/deploy_cluster.yml
- Configure Jenkins as required ( + add docker registry credentials)
- See
ansible/
for other scripts to run
- Fix Frontend Login Example
- Fix Jenkins Deployment (Slave connection issues)
- Configure services deployed via .env
- Hide API test urls behind a .htpasswd
- Add influxDB for API access tracking
- Replace security details in auth/app/auth_functions.py (TODO: use .env)
- webservice deployment on Jenkins
- Move all config to .env
It's recommended to run 3 nodes (1 leader and 2 managers), especially for a Jenkins deployment but it's not entirely necessary
Rebalance a service across all nodes using:
docker service update --force <servicename>
Port | Protocol | Description |
---|---|---|
443 | TCP | HTTPS Access |
8080 | TCP | Traefik Load Balancer Port |
9090 | TCP | Prometheus Port |
2377 | TCP | Docker Swarm Join Port |
50000 | TCP | Jenkins Agent Listener |
7946 | TCP | Docker Coms |
7946 | UDP | Docker Coms |
4789 | UDP | Docker Swarm Overlay |
22 | TCP | SSH |
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Action": [
"route53:GetChange",
"route53:ChangeResourceRecordSets",
"route53:ListResourceRecordSets"
],
"Resource": [
"arn:aws:route53:::hostedzone/*",
"arn:aws:route53:::change/*"
]
},
{
"Sid": "",
"Effect": "Allow",
"Action": "route53:ListHostedZonesByName",
"Resource": "*"
}
]
}
The main deployment happens over an AWS Spot Fleet. Our 'swarm-leader' in Ansible has an AWS Elastic IP attached and is a Free Tier AWS instance. This swarm leader is running a private redis container that exposes the redis port publicly (password protected). This allows spot instances that come up to ping the swarm leader and receive either a worker token or a manager token.
Due to using a Spot Fleet in this configuration, an AMI need to be created in order to run the correct swarm join script at startup:
- Worker Node AMI
NOTE: Jenkins nodes (as managers) had previously been considered for spot fleet deployment, but after thinking about it. All managers should be on-demand instances. I don't want AWS coming along and pulling 2 of my 3 managers at the same time (very possible). This would bring down the whole cluster.
- Manually SSH into a fresh machine
sudo yum install -y python3
- On Ansible host
ansible-playbook ami_worker.yml
- Create image from machine using the AWS console
For the testing, it makes sense run unit tests on each individual service you run. This will then also allow code coverage to be estimated. The challenge here will be to merge all the tests at the end.
Jenkins will:
- See a git push and clone the repo.
- Build necessary images.
- Push newly built images to the docker registry
- Deploy a 'test' stack on the swarm.
- Send a GET request to each microservice /test endpoint (this means that Jenkins needs access to each service)
- The /test endpoint will return the output of the tests including a coverage estimation
- Jenkins merges all these results together and delivers the result.
- Jenkins removes the 'test' stack from the swarm
- If all tests passed, run:
ansible-playbook deploy-stack.yml
ansible-playbook deploy-monitoring-stack.yml
NOTE: In order for builds to be independant, we need to deploy separate TEST stacks for each build.
The docker registry exposes port 5000 on the swarm and is therefore available via localhost:5000
outside of containers
Service | Ports | Network | Traefik |
---|---|---|---|
traefik | 80:80, 8080:8080, 433:433 | proxy | - |
webservice | 5000:5000 | proxy, auth, services | localhost |
public_api | 5001:80 | proxy, auth, services | api.localhost |
auth | 5005:80 | proxy, auth | - |
redis | 6379:6379 | auth | - |
newswatcher | 5001:80 | services | - |
couchdb | 5984:5984 | services | - |
Tried running Couchbase but seemed very heavy on CPU usage even when idle