Archiva Deployment on AWS

We have an Archiva server running on Amazon Web Services. We intend to use this with the Remote Data Toolbox and isetbio.

Here are some notes about the deployment and how it works. I ([email protected]) hope this will be useful for:

isetbio team members who want to know what's up
other isetbio admins who might need to work on the server
myself a year from now, so I can remember how I figured all this out
anyone else who wants to set up a similar Archiva server

The Server

Visit the server with your browser: http://52.32.77.154. As a guest, you can browse any of the repositories. If you have an admin account, you can do more.

Amazon Web Services

The server is running on AWS. We chose AWS because we are already using AWS for other work at the Brainard Lab, and we already had the account set up.

Please contact Ben ([email protected]) or David ([email protected]) if you are a team member and you need credentials to access the account.

Google Cloud Computing would have been a good alternative to AWS, which we happened not to pursue.

Expected Costs

We should expect significant costs from our running container instance and from our EBS storage.

We are running an m4.large container instance. In the us-west-2 region, this instance type costs $0.126 / hour, or $92 / month. We could scale this up or down as needed.

We are using a general-purpose EBS volume which costs $0.10 / GB-month. Currently our volume is 100GB, or $10 / month. We may eventually scale up to 1TB, or $100 / month.

So we should expect to pay up to $200 / month. This may be less, depending on our instance utilization rate and actual storage usage.

Overview Diagram

I attempted to use standard AWS components wherever possible, and to avoid idiosyncratic choices, shell scripts, secret facts that only I remember, etc. I hope that as a result, reading the AWS documentation is sufficient to understand the deployment.

But there's still a lot of AWS jargon involved, like "Elastic this" and "Elastic that". Here is an overview diagram which may help the reader deal with the jargon below.

Docker

We are deploying Archiva via Docker. Here is the Dockerfile on GitHub.

Here is the DockerHub project, which automatically builds Docker images based on the GitHub repository and makes them available to the public.

Our Dockerfile configuration separates the Archiva binaries from our project-specific Archiva config and data. This makes it possible to mount in a volume of config and data at container launch time. This should make the deployment easier to maintain, back up, and scale.

Elastic Container Service

Since we are using a Docker container, we can deploy the server via Amazon's Elastic Container Service.

This lets us declare our configuration in files: the Docker file which defines our Archiva Docker image, and some JSON (see below) which tells Amazon how to launch a running container.

Declaring our configuration in files is a good thing. It helps us avoid writing and maintaining home brewed documentation of the form "Click this, then you should see this, then type this, etc."

Elastic Block Storage

We set up our Archiva Docker image to separate the Archiva binaries from our project-specific configuration and data. We put this configuration and data on an EBS volume. This is a volume that lives on Amazon, which can be attached and detached from running containers.

To make backups of our server, we can take snapshots of our EBS volume. Amazon stores these backups on S3.

We can use these snapshots when we spin up new Archiva instances. We just have to attach the volume to the container at launch time. See JSON below for some details. There are several cases when we may want to do this:

routine maintenance
if we want to scale the service with bigger or more instances
if we want to replicate the data across Availability Regions

Elastic Compute Cloud

Amazon takes care of launching our Docker image into a running container. We have to define the type and configuration of EC2 instance which runs the container.

Launch Configuration

We define these things in a Launch Configuration. This is a template from which Amazon can launch one or more instances.

Most of our launch configuration is based on the official documentation for launching ECS container instances. Some highlights of our launch configuration:

uses an m4.large instance type
attaches an EBS volume to the instance at /dev/sdf, based on a snapshot of our Archiva config and data volume
uses a custom AMI which extends the official ECS-optimized AMI to also mount our /dev/sdf device at /var/archiva.
uses our ecsInstanceRole IAM role, which allows the instance to work with the Elastic Container Service.
defines some "user data" which allows the instance to connect itself to our brainard-archiva ECS cluster.

The user data is just a short bash script:

#!/bin/bash
rm -f /var/lib/ecs/data/*
echo ECS_CLUSTER=brainard-archiva >> /etc/ecs/ecs.config

Auto Scaling Group

We use our launch configuration to create an Auto Scaling Group. This allows Amazon to start and stop instances automatically for us. Our group is simple, it just makes sure that one instance is running, and starts a new instance in case the first one fails.

Elastic IP and DNS / Route53

The server IP address http://52.32.77.154 is an Amazon Elastic IP which we can assign to any running container.

This is a good thing because it will allow us to do maintenance and shuffle things around behind the scenes, yet users will be able to keep the same address.

We could take this a step further by giving this IP address a memorable DNS name. Amazon would make this easy with a service called Route53.

If we replicate our server across regions, we can also use Route53 geographic routing to automatically route users to the server closest to them.

Regions and Availability

Currently our EBS volume and container instance are located in Amazon's us-west-2 Region. This means the web access and data transfers are available to the world, but optimized for users located on the US West Coast.

This may be sufficient. Or, we might wish to optimize access in additional geographic regions by replicating the data across regions. There is no single "right way" to do this, and Amazon does not do this automatically.

We have at least two replication options to choose from: low-level replication using AWS, or high-level replication using Archiva.

Low-level Replication Using AWS

Amazon allows us to copy our Archiva EBS volume across regions. We could use cross-region EBS copies to set up running containers in different regions.

Pros:

keeps our Archiva config simple and uniform across regions
users never have to wait for data to move across regions

Cons:

we have to periodically perform the cross-region replication
probably need one region to be the official source of truth, where users may send uploads
probably need to treat servers in other regions as read-only
replication will squash uploads in other regions

High-level Replication Using Archiva

Archiva servers can act as proxies for other Archiva servers. If a requested artifact is missing from a given server, it can ask another server for the same artifact. We could set up Archiva servers in multiple regions and configure them as proxies of each other.

Pros:

artifacts copied across regions on demand, only as needed
users could upload to whichever server is closest to them
we don't have to do anything periodically

Cons:

our Archiva configuration could get complex because each server would need to know about the others
each Archiva server would have its own "precious" configuration
first time an artifact is copied across a region, user would have to wait for transfer

I don't know which option is better. We may have to live and learn.

Appendix A: task definition JSON

For reference, here is the JSON which tells ECS how to launch our Docker image into a running container.

Key parts:

image points at our ninjaben/archiva-docker DockerHub project.
portMappings tells AWS to make the server available to the public at the default HTTP port 80.
volumes and mountPoints make configuration and data contained on our attached EBS volume available to Archiva inside the container.

The rest is more or less default.

{
  "requiresAttributes": [],
  "taskDefinitionArn": "arn:aws:ecs:us-west-2:547825153113:task-definition/brainard-archiva:1",
  "status": "ACTIVE",
  "revision": 1,
  "containerDefinitions": [
    {
      "volumesFrom": [],
      "memory": 6000,
      "extraHosts": null,
      "dnsServers": null,
      "disableNetworking": null,
      "dnsSearchDomains": null,
      "portMappings": [
        {
          "hostPort": 80,
          "containerPort": 8080,
          "protocol": "tcp"
        }
      ],
      "hostname": null,
      "essential": true,
      "entryPoint": null,
      "mountPoints": [
        {
          "containerPath": "/var/archiva",
          "sourceVolume": "brainard-archiva",
          "readOnly": null
        }
      ],
      "name": "brainard-archiva",
      "ulimits": null,
      "dockerSecurityOptions": null,
      "environment": [],
      "links": null,
      "workingDirectory": null,
      "readonlyRootFilesystem": null,
      "image": "ninjaben/archiva-docker",
      "command": null,
      "user": null,
      "dockerLabels": null,
      "logConfiguration": null,
      "cpu": 0,
      "privileged": null
    }
  ],
  "volumes": [
    {
      "host": {
        "sourcePath": "/var/archiva"
      },
      "name": "brainard-archiva"
    }
  ],
  "family": "brainard-archiva"
}

Appendix B: Jetty email config XML

I configured email sending for the Archiva server. I used Googles free smtp service.

This config is not part of the Archiva Docker image because it includes a Google app password from my personal Google account.

/var/archiva/conf/jetty.xml

  <New id="validation_mail" class="org.eclipse.jetty.plus.jndi.Resource">
    <Arg>mail/Session</Arg>
    <Arg>
      <New class="org.eclipse.jetty.jndi.factories.MailSessionReference">
        <Set name="user">***</Set>
        <Set name="password">***</Set>
        <Set name="properties">
          <New class="java.util.Properties">
            <Put name="mail.user">***</Put>
            <Put name="mail.password">***</Put>
            <Put name="mail.smtp.host">smtp.gmail.com</Put>
            <Put name="mail.transport.protocol">smtp</Put>
            <Put name="mail.smtp.port">587</Put>
            <Put name="mail.smtp.auth">true</Put>
            <Put name="mail.smtp.starttls.enable">true</Put>
            <Put name="mail.debug">true</Put>
          </New>
        </Set>
      </New>
    </Arg>
  </New>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly