-
Notifications
You must be signed in to change notification settings - Fork 6
Archiva Deployment on AWS
We have an Archiva server running on Amazon Web Services. We intend to use this with the Remote Data Toolbox and isetbio.
Here are some notes about the deployment and how it works. I ([email protected]) hope this will be useful for:
- isetbio team members who want to know what's up
- other isetbio admins who might need to work on the server
- myself a year from now, so I can remember how I figured all this out
- anyone else who wants to set up a similar Archiva server
Visit the server with your browser: http://52.32.77.154. As a guest, you can browse any of the repositories. If you have an admin account, you can do more.
The server is running on AWS. We chose AWS because we are already using AWS for other work at the Brainard Lab, and we already had the account set up.
Please contact Ben ([email protected]) or David ([email protected]) if you are a team member and you need credentials to access the account.
Google Cloud Computing would have been a good alternative to AWS, which we happened not to pursue.
We should expect significant costs from our running container instance and from our EBS storage.
We are running an m4.large
container instance. In the us-west-2
region, this instance type costs $0.126 / hour, or $92 / month. We could scale this up or down as needed.
We are using a general-purpose EBS volume which costs $0.10 / GB-month. Currently our volume is 100GB, or $10 / month. We may eventually scale up to 1TB, or $100 / month.
So we should expect to pay up to $200 / month. This may be less, depending on our instance utilization rate and actual storage usage.
I attempted to use standard AWS components wherever possible, and to avoid idiosyncratic choices, shell scripts, secret facts that only I remember, etc. I hope that as a result, reading the AWS documentation is sufficient to understand the deployment.
But there's still a lot of AWS jargon involved, like "Elastic this" and "Elastic that". Here is an overview diagram which may help the reader deal with the jargon below.
We are deploying Archiva via Docker. Here is the Dockerfile on GitHub.
Here is the DockerHub project, which automatically builds Docker images based on the GitHub repository and makes them available to the public.
Our Dockerfile configuration separates the Archiva binaries from our project-specific Archiva config and data. This makes it possible to mount in a volume of config and data at container launch time. This should make the deployment easier to maintain, back up, and scale.
Since we are using a Docker container, we can deploy the server via Amazon's Elastic Container Service.
This lets us declare our configuration in files: the Docker file which defines our Archiva Docker image, and some JSON (see below) which tells Amazon how to launch a running container.
Declaring our configuration in files is a good thing. It helps us avoid writing and maintaining home brewed documentation of the form "Click this, then you should see this, then type this, etc."
We set up our Archiva Docker image to separate the Archiva binaries from our project-specific configuration and data. We put this configuration and data on an EBS volume. This is a volume that lives on Amazon, which can be attached and detached from running containers.
To make backups of our server, we can take snapshots of our EBS volume. Amazon stores these backups on S3.
We can use these snapshots when we spin up new Archiva instances. We just have to attach the volume to the container at launch time. See JSON below for some details. There are several cases when we may want to do this:
- routine maintenance
- if we want to scale the service with bigger or more instances
- if we want to replicate the data across Availability Regions
Amazon takes care of launching our Docker image into a running container. We have to define the type and configuration of EC2 instance which runs the container.
We define these things in a Launch Configuration. This is a template from which Amazon can launch one or more instances.
Most of our launch configuration is based on the official documentation for launching ECS container instances. Some highlights of our launch configuration:
- uses an m4.large instance type
- attaches an EBS volume to the instance at
/dev/sdf
, based on a snapshot of our Archiva config and data volume - uses a custom AMI which extends the official ECS-optimized AMI to also mount our
/dev/sdf
device at/var/archiva
. - uses our
ecsInstanceRole
IAM role, which allows the instance to work with the Elastic Container Service. - defines some "user data" which allows the instance to connect itself to our
brainard-archiva
ECS cluster.
The user data is just a short bash script:
#!/bin/bash
rm -f /var/lib/ecs/data/*
echo ECS_CLUSTER=brainard-archiva >> /etc/ecs/ecs.config
We use our launch configuration to create an Auto Scaling Group. This allows Amazon to start and stop instances automatically for us. Our group is simple, it just makes sure that one instance is running, and starts a new instance in case the first one fails.
The server IP address http://52.32.77.154 is an Amazon Elastic IP which we can assign to any running container.
This is a good thing because it will allow us to do maintenance and shuffle things around behind the scenes, yet users will be able to keep the same address.
We could take this a step further by giving this IP address a memorable DNS name. Amazon would make this easy with a service called Route53.
If we replicate our server across regions, we can also use Route53 geographic routing to automatically route users to the server closest to them.
Currently our EBS volume and container instance are located in Amazon's us-west-2
Region. This means the web access and data transfers are available to the world, but optimized for users located on the US West Coast.
This may be sufficient. Or, we might wish to optimize access in additional geographic regions by replicating the data across regions. There is no single "right way" to do this, and Amazon does not do this automatically.
We have at least two replication options to choose from: low-level replication using AWS, or high-level replication using Archiva.
Amazon allows us to copy our Archiva EBS volume across regions. We could use cross-region EBS copies to set up running containers in different regions.
Pros:
- keeps our Archiva config simple and uniform across regions
- users never have to wait for data to move across regions
Cons:
- we have to periodically perform the cross-region replication
- probably need one region to be the official source of truth, where users may send uploads
- probably need to treat servers in other regions as read-only
- replication will squash uploads in other regions
Archiva servers can act as proxies for other Archiva servers. If a requested artifact is missing from a given server, it can ask another server for the same artifact. We could set up Archiva servers in multiple regions and configure them as proxies of each other.
Pros:
- artifacts copied across regions on demand, only as needed
- users could upload to whichever server is closest to them
- we don't have to do anything periodically
Cons:
- our Archiva configuration could get complex because each server would need to know about the others
- each Archiva server would have its own "precious" configuration
- first time an artifact is copied across a region, user would have to wait for transfer
I don't know which option is better. We may have to live and learn.
For reference, here is the JSON which tells ECS how to launch our Docker image into a running container.
Key parts:
-
image
points at ourninjaben/archiva-docker
DockerHub project. -
portMappings
tells AWS to make the server available to the public at the default HTTP port80
. -
volumes
andmountPoints
make configuration and data contained on our attached EBS volume available to Archiva inside the container.
The rest is more or less default.
{
"requiresAttributes": [],
"taskDefinitionArn": "arn:aws:ecs:us-west-2:547825153113:task-definition/brainard-archiva:1",
"status": "ACTIVE",
"revision": 1,
"containerDefinitions": [
{
"volumesFrom": [],
"memory": 6000,
"extraHosts": null,
"dnsServers": null,
"disableNetworking": null,
"dnsSearchDomains": null,
"portMappings": [
{
"hostPort": 80,
"containerPort": 8080,
"protocol": "tcp"
}
],
"hostname": null,
"essential": true,
"entryPoint": null,
"mountPoints": [
{
"containerPath": "/var/archiva",
"sourceVolume": "brainard-archiva",
"readOnly": null
}
],
"name": "brainard-archiva",
"ulimits": null,
"dockerSecurityOptions": null,
"environment": [],
"links": null,
"workingDirectory": null,
"readonlyRootFilesystem": null,
"image": "ninjaben/archiva-docker",
"command": null,
"user": null,
"dockerLabels": null,
"logConfiguration": null,
"cpu": 0,
"privileged": null
}
],
"volumes": [
{
"host": {
"sourcePath": "/var/archiva"
},
"name": "brainard-archiva"
}
],
"family": "brainard-archiva"
}
I configured email sending for the Archiva server. I used Googles free smtp service.
This config is not part of the Archiva Docker image because it includes a Google app password from my personal Google account.
/var/archiva/conf/jetty.xml
<New id="validation_mail" class="org.eclipse.jetty.plus.jndi.Resource">
<Arg>mail/Session</Arg>
<Arg>
<New class="org.eclipse.jetty.jndi.factories.MailSessionReference">
<Set name="user">***</Set>
<Set name="password">***</Set>
<Set name="properties">
<New class="java.util.Properties">
<Put name="mail.user">***</Put>
<Put name="mail.password">***</Put>
<Put name="mail.smtp.host">smtp.gmail.com</Put>
<Put name="mail.transport.protocol">smtp</Put>
<Put name="mail.smtp.port">587</Put>
<Put name="mail.smtp.auth">true</Put>
<Put name="mail.smtp.starttls.enable">true</Put>
<Put name="mail.debug">true</Put>
</New>
</Set>
</New>
</Arg>
</New>