Skip to content

Commit

Permalink
Adding helper info about aws and git.
Browse files Browse the repository at this point in the history
  • Loading branch information
m8pple committed Feb 5, 2015
1 parent 15e05f7 commit 9c5f8e3
Show file tree
Hide file tree
Showing 2 changed files with 624 additions and 0 deletions.
399 changes: 399 additions & 0 deletions aws.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,399 @@
Things to note about AWS
------------------------

First AWS is a commercial service, and they are in the
business of charging people money for access. This
course relies either on free instances, or on the $100
that I give you, but be aware that
there are consequences of leaving instances turned on
for long periods of time. Each type of instance
[costs money](http://aws.amazon.com/ec2/pricing/) per hour.
For example, if you accidentally leave a GPU instance running,
you will be charged $0.65 an hour until you stop it.

_AWS also offers ["spot pricing"](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-spot-instances-history.html)
where the price fluctuates according to demand - this is great for batch
compute, but I won't recommend it for this course as if the price
rises your machine may be terminated and you can lose work
unless you have good processes in place. If you look at the
pricing history it is quite interesting to try to work
out what people are doing. On the GPU instances people
occasionally spike the price to the on-demand rate, while
for c4.8xlarge the spot price quite sometimes spikes and
holds at 4x the on-demand rate._

Each instance can be in one of [a few states](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-lifecycle.html):

1. Pending: once launched, the instance goes through some initial
checks and is allocated to a machine.

2. Running: the machine is up and active, and you can SSH in and
use it.

3. Stopped: the machine is not currently active, but the state of
the hard disk is maintained, and can be run again.

4. Terminated: the machine is destroyed, including the hard disk.

Broadly speaking, you can think of the "running" state as
"using electricity", and the "stopped" state as "using up
a hard disk". AWS will always charge you for electricity
and disk space, but generally charges more for electricity.

You should all be eligible for the [one year special tier](http://aws.amazon.com/free/),
which gives you certain advantages:

- 750 hours of linux t2.micro server time. So you can leave
a micro instance on all the time if you wanted to.

- 30GB of EBS storage. This is where your instance disks
are stored, so you can have a few in the stopped state
and not be charged.

The t2.micro instances are great for checking compilation
and automation, as you don't need to worry about cost.

Any other instances will eat into your $100, so you need
to be a bit careful about managing them. Things to
note are:

- If you have an instance running for t hours, you will
be billed for ceil(t) hours. So a GPU instance used for
10 minutes will cost the same as an hour.

- Stopping then starting an instance will cause the time
to be rounded up to the nearest hour, then start a new
billing period. So running an instance for 10 minutes,
then stopping it, starting it, and running it for another
10 minutes will cost the same as two hours.

- [Rebooting an instance](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-reboot.html)
does not result in a different instance, and so does not
cause a new billing hour to start.

I have no more money to give you if yours runs out, and
you need to keep money for the later courseworks, so you
need a certain amount of planning in how you spend your money.
Don't start working with a GPU or large instance unless
you know you'll be able to spend a reasonable amount of
time with it, and use cheaper instances, lab/personal
machines and VMs to test build automation, compilation, and
testing whenever possible. Though if you consider $100 at $0.65
an hour, you're looking at 150+ hours, so there is plenty there.

Best practises for using AWS
----------------------------

1. *Always* check your instances are finished when you stop
working with non-free instances. Check it has transitioned
to the stopped or terminated state.

2. Protect your instance key-pairs.

3. Use the cheapest machine you can for the current purpose;
testing OpenCL code for correctness doesn't necessarily
need a GPU; checking that builds work can often be done
in a tiny instance.

4. Plan your work; get everything possible done on a
local machine (VM or native, linux or windows) first.

Getting an AWS Account
-------------------------

You need to have an Amazon account first (the kind
you use to buy books), then go to the [AWS site](http://aws.amazon.com/)
and create an AWS account.

It will ask you for a credit or debit card, but this
will not be charged if you only use free instances,
and/or stay within the $100 credit you'll get. As
I mentioned above, this is real money, but as long
as you manage your instances it won't cost you
anything.

Create a (tiny, free) Ubuntu 14.04 machine instance
---------------------------------------------------

### Step 1: Choose an Amazon Machine Image (AMI)

AWS: choose Ubuntu Server 14.04 LTS (HVM)

### Step 2: Choose an Instance Type

Select the free tier (you could choose a more expensive
one, but then you need to spend money).

Go to "Next: Configure Instance Details"

### Step 3: Configure Instance Details

You should be able to leave them at the defaults
(though it is interesting to look at all the options
by hovering over the (i) buttons).

### Step 4: Add Storage

You can leave at the defaults, but again, it is interesting
to read. If you ever need to work with big-ish data
then these options matter a lot.

### Step 5: Tag Instance

We don't need this, but it is useful if you have 20 instances
and you need to be able to identify which is which (Err,
don't create 20 instances unless you are rich).

### Step 6: Configure Security Group

This one is quite important. Your server will be alive on
the internet, open to the world, so you need to limit
access to you. We will use use one port, allowing SSH,
though we will allow it to be accessible from anywhere.

1. Select "Create a new security group". (It should be
auto-selected, and the defaults listed below should
be correct as well).

2. Make sure the "Type" on the left is SSH (Secure Shell Server).

3. Protocol and Port Range will then be fixed to TCP and 22.

4. For Source, specify Anywhere. This is so you can login from wherever
your happen to be (so could anyone else, but SSH should stop them).

5. For security group name, choose something meaningful like
"ssh-only".

Do Next: a dialogue should pop up saying
"Select an existing or new key pair."

### 7. Selecting a key pair

First, do _not_ proceed without a key pair. These things are important,
as they are the thing that allows you to SSH into your instance.

1. Read the description of the key pair that it gives you.

2. Select "Create a new key pair".

3. Choose a key pair name. I would suggest putting your name or
login in it, for example, I use "m8pple-key-pair"

4. Download the key pair. This thing is important for as long
as the instance is running, so keep it somewhere safe.
However, you can always generate more key pairs if
you lose one. If you are on a shared unix machine, change
the permissions so that only you can access it:

chmod og-rwx m8pple-key-pair.pem

5. Finish the process, and your instance will launch.

Just re-emphasising the important of key pairs: they
are essentially the front-door key to your server.

Connecting to the instance
--------------------------

Use the "View Instances" button, or just go back to the
AWS dashboard (it doesn't matter how you get there).

You should now be able to see an instance running in
the dashboard, with a green symbol, and probably a
status that says "Initialising". If you click on that row,
then the bottom of the dashboard will show you details
about it.

The thing we need to connect is the DNS or IP address
of the instance - either will work to identify it. For example,
I have previously received:

ec2-54-201-95-131.us-west-2.compute.amazonaws.com

as an instance, which is correspondingly at the IP address:

54.201.95.131

To connect to the server, you need to SSH to it.

### Linux/Cygwin

You can ssh directly from the command line, using:

ssh -i <path-to-your-key-pair> ubuntu@<dns-name-of-your-server>

That should drop you into a command line on the remote server.

### Windows (Putty)

There is a great ssh terminal for windows called
[PuTTY](http://www.chiark.greenend.org.uk/~sgtatham/putty/),
which I highly reccommend. To use it, you need to
convert the .pem file to a putty .ppk file:

1. Start PuTTYgen (one of the programs that comes with putty)

2. Conversions -> Import Key.

3. Select your .pem file and open it.

4. At this stage you can choose a passphrase using
the "Key passphrase" box, which will be used to
encrypt the private key. Personally I prefer to
have a passphrase, as otherwise anyone who
gets the key can get any of my running instances.
However, you can leave it blank, and just protect
the key well.

4. File -> Save Private Key.

5. You may get prompted about an empty passphrase,
just ignore if you didn't want one.

6. Choose a .ppk file to save it as.

You can now start PuTTY itself. You might want to
set this up once and save it:

1. Session: "Host Name (or IP address)": Put the DNS name of your amazon instance.

2. Connection -> SSH -> Auth: Specify the private .ppk file you just created.

3. Connnection -> Data: In "Auto-login username" put "ubuntu".

4. Connection -> "Saved Sessions": Choose some name for this connection, e.g. "AWS", and
hit save.

5. Hit "Open"

You should now be dropped into your remote server. If you switch
to a new instance you will need to change the host settings,
but the rest should stay the same.

Setting up the instance
-----------------------

By default, your instance has almost nothing on
it. Try running g++:

g++ -v

And it will tell you it is not installed, but
does suggest how to install it:

sudo apt-get g++

This involves two commands:

- [sudo](en.wikipedia.org/wiki/Sudo) A program for running commands
as [root or adminstrator](http://en.wikipedia.org/wiki/Superuser).

- [apt-get](http://linux.die.net/man/8/apt-get) A package manager
which handles the installation or removal of tools and libraries.

Similarly, try running git, make, and so on,
and you'll find you need to install them too.

TBB is also available as a package, but you
need to search for that:

apt-cache search tbb

You should see four or five packages related
to tbb, but libtbb-dev should force everything
you need in:

sudo apt-get libtbb-dev

Getting code over to your machine
---------------------------------

You now have a few options for getting code
over to your machine:

- Copying files over via scp (file transfer via SSH).
- `cat`ing files down the SSH connection (not really recommended,
but occasionally very useful).
- Pulling code over via git.

I am going to recommend getting the code via git,
as it is a nice way of doing things, and makes it
easier to bring any patches you make in the test
environment back out to github. The main sticking point
is authentication, as your AWS instance will be able
to communicate with github, but doesn't have
access to your keys.

You could transfer your SSH keys over and use `ssh-agent`
remotely, but it is better to keep your keys where you
control them, using a method called [SSH agent forwarding](https://developer.github.com/guides/using-ssh-agent-forwarding/).

First, make sure you are currently authenticated
with github, by doing:

ssh [email protected]

or the equivalent in PuTTY. If you receive something
like `Permission denied (publickey).`, then you haven't
got an agent set up. Start `ssh-agent` or `pageant`, load
your github SSH keys in (these are distinct from your
AWS keypair), then try again. Hopefully eventually you
will see something like:

Hi m8pple! You've successfully authenticated, but GitHub does not provide shell access.

This shows that you successfully have agent
authentication working on your local machine. We
can now use authentication forwarding (`-A`) to allow
the remote server to access your local authentication
agent:

ssh -A -i <path-to-your-key-pair> ubuntu@<dns-name-of-your-server>

You should end up on the remote server again, but if you
(within the SSH session, on the remote server) do:

ssh [email protected]

You should see that you are authenticated with github
on the other machine.

You can now issue git clone command to get your
repository remotely, then do commit/push/pull as normal.

If you make modifications on the remote server,
don't forget to "push" any changes back into github,
and then (if necessary) to "pull" changes back down
to your normal working repository. For those who
are not used to the remote git, then the `git commit -a`
command is useful, as it auto-stages all your changes.

Editing code on the remote instance
-----------------------------------

I would not recommend doing much editing on
the remote instances, it should be more for
testing, tuning, and experimentation. But inevitably
you will need to change some source files, and
need some way of editing the files remotely (you
don't want to be pulling for each edit). There
is a command line editor called [nano](http://www.nano-editor.org/docs.php)
installed on pretty much all unix machines which
you can use to make small changes. For example,
to edit the file `wibble.cpp`, just do:

nano wibble.cpp

You'll end up in an editor which behaves as you
would expect. Along the bottom are a number of
keyboard short-cuts, with `^` representing control.
The main ones you'll want are:

- `^X` (ctrl+x) : Quit the editor (it will prompt to save).
- `^O` (ctrl+o) : Write the file out without changing it.
- `^G` (ctrl+g) : Built-in documentation.

Other text editors can be used or installed (emacs, vim, ...),
but I would suggest nano for most tasks you will
encounter here.
Loading

0 comments on commit 9c5f8e3

Please sign in to comment.