-
Notifications
You must be signed in to change notification settings - Fork 10
A Hands on Introduction to AWS for Cloud Computing January 26, 2022
When: January 26, 2022, from 10 am PST - 12 pm PST
Instructors: Dr. Rayna Harris
Helpers: Dr. Amanda Charbonneau Jessica Lumian and Jeremy Walter
Your instructors are part of the training and engagement team for the NIH Common Fund Data Ecosystem, a project supported by the NIH to increase data reuse and cloud computing for biomedical research.
This 2-hour hands-on tutorial will introduce you to creating a computer "in the cloud" and logging into it, via Amazon Web Services. We will launch a small general-purpose Linux instance, connect to it, and run a small job while discussing the concepts and technologies involved.
- Terminology and Sign-On
- Launching an EC2 Instance
- Connecting to AWS instances
- Running programs at the command line
- Copying data from AWS instance onto your local computer
- Windows Users
- MacOS
- Shutting down instances
- Summary
📝 Please fill out our pre-workshop survey if you have not already done so!
✔️ Windows users should install Mobaxterm. Read our quick installation guide.
If you have questions,
- Type them in the group chat
- Direct message the moderator
- Unmute and ask them outloud
We're going to use the raised hand ✋ reaction in zoom to make sure people are on board during the hands-on activities.
Cloud computing is the on-demand use of data storage and compute power without direct active management by the user. Amazon Web Services (AWS) is one of the most broadly adopted cloud platforms.
Some advantages of using AWS include:
- Easy sign-on
- Simple billing
- Stable services
- Customizable images
- Customer support
- Online resources
Amazon's Elastic Compute Cloud (EC2) is a web service that provides secure, resizable compute capacity in the cloud. Amazon's Simple Storage Service (S3) is widely used for storing and sharing data.
An instance is a virtual machine that runs in the cloud. An image (or AMI for Amazon Machine Image) is a template that contains the software configuration (including operating system and applications) required to launch your instance. You can select an image provided by the AWS Marketplace, the AWS community, or you can select one of your own images. When you launch an instance, you specify the type of image to use.
Today, everything you do will be paid for by us. Your free login credentials will work for the next 24 hours. In the future, if you create an AWS account, you will have to add a credit card for billing. We'd be happy to answer questions about how to pay for AWS.
Log in to your account by going to this web address: https://cfde-training-workshop.signin.aws.amazon.com/console.
Find your first name in the table below and log in with that as your IAM user name and the password provided by the instructors.
✋ Raise your hand in Zoom when you've successfully logged in with the workshop user credentials.
You can launch an instance using the AWS launch instance wizard. The launch instance wizard specifies all the launch parameters required for launching an instance. Where the launch instance wizard provides a default value, you can accept the default or specify your own value. At the very least, you need to select an AMI and a key pair to launch an instance. Let's walk through the following steps.
-
Open the Amazon EC2 console at https://console.aws.amazon.com/ec2/
-
AWS has servers all over the world. In the top right corner, click the drop-down menu to select a global region. For this workshop choose US West (N. California) us-west-1. In the future, you should pick a region near your or one that contains your data.
- AWS is beta testing a new version of the launch version of the wizard that goes through all the steps in one page instead of many. It is awesome. Click the button at the top to get started. If you accidentally close the banner with the beta button, refresh the page to bring it up again.
-
First, give your instance a name (such as your first name) so that you can distinguish your instances from your classmates'. This is optional but very useful for keeping track of multiple instances on the same account.
-
The next step is to pick an image. Our preferred image is not listed in the Quick Start list, so we must find it in the Marketplace. Type Ubuntu 20.04 LTS - Focal in the search bar. Then click AWS Marketplace AMIs. Once you see, Ubuntu 20.04 LTS - Focal, click .
-
Next, we must specify how much memory and ram we need by specifying an instance type. The t2.micro instance is "Free tier eligible" and provides 1CPU and 1GB of memory. This is perfect for our class.
-
The final step is to create a new key pair. This will be used in the next section to connect to your instance via
ssh
. Give your key pair a name (without spaces). Use the default settings of RSA type and .pem format. Save this file locally (e.g. in your downloads or your desktop). -
For this workshop, we will choose the default network, security, and storage settings, so there is nothing else to change.
-
Once your instance launches, click the button at the bottom of the page.
✋ Raise your hand in Zoom when you've successfully launched an instance.
Congratulations! You have successfully launched an instance. The next step is to connect to your instance.
There are three ways to connect an AWS instance:
- with a web browser
- using
ssh
from the Terminal - using an ssh client such as MobaXterm
Let's connect to our instances using a web browser.
- Find your instance in the list of running instances.
- Click the empty check box next to your name.
- Then click "Connect" in the top center of your browser.
- This will open a window that provides details about your instances. Click the button at the bottom of your screen.
After you click connect, a new tab will open in your browser with a Terminal window that looks something like this.
✋ Raise your hand in Zoom when you've successfully connected to your instance.
If at any time, your instance stops responding, hit the "refresh" button and functionality should be restored, right where you left off.
Now that you have successfully launched a terminal in your browser, you can run programs at the command line. If you attended last week's Intro to UNIX for Cloud Computing workshop, we used a variety of commands to navigate the file system and work with files. Let's revisit a few.
First, print your working directory with pwd
.
pwd
You should see:
/home/ubuntu
Now if you type ls
, it may look like you do not have any files, but remember some files are hidden. Let's use the -a
option to list all files, -l
for long listing format, and the -F
option to append a classifier.
ls -alF
From this, we can see a few directories. The .ssh
directory contains the ssh key you created a moment ago.
drwxr-xr-x 4 ubuntu ubuntu 4096 Jan 25 18:59 ./
drwxr-xr-x 3 root root 4096 Jan 25 18:59 ../
-rw-r--r-- 1 ubuntu ubuntu 220 Feb 25 2020 .bash_logout
-rw-r--r-- 1 ubuntu ubuntu 3771 Feb 25 2020 .bashrc
drwx------ 2 ubuntu ubuntu 4096 Jan 25 18:59 .cache/
-rw-r--r-- 1 ubuntu ubuntu 807 Feb 25 2020 .profile
drwx------ 2 ubuntu ubuntu 4096 Jan 25 18:59 .ssh/
Your instance comes pre-configured with a number of computer programs. These are stored in your root directory (/
) in the bin
directory. You can list the programs installed on your instance by providing ls
with the full path /bin
.
ls /bin
This will print all the installed programs to your screen. Here are a few of the programs. You may recognize a few of the programs we used last week, such as gunzip
, gzip
, and head
.
...
grub-render-label rvim zcat
grub-script-check savelog zcmp
grub-syslinux2cfg sbattach zdiff
gsettings sbkeysync zdump
gtbl sbsiglist zegrep
gunzip sbsign zfgrep
gzexe sbvarsign zforce
gzip sbverify zgrep
h2ph scp zipdetails
h2xs screen zless
hd screendump zmore
head script znew
helpztags scriptreplay
hexdump scsi_logging_level
...
To practice working with some of these command-line programs, we need some files to work on. Let's use the curl
command to download the same files we used in last week's workshop, which are stored in a .zip file in an Amazon S3 bucket. The -O
option says to use the same filename that is specified in the web address.
curl -O https://s3.us-west-1.amazonaws.com/dib-training.ucdavis.edu/shell-data2.zip
You should see the following message.
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 38.4M 100 38.4M 0 0 67.8M 0 --:--:-- --:--:-- --:--:-- 67.8M
You can type ls
to confirm the download was completed successfully.
ls
You should now have one file in your working directory.
shell-data2.zip
Now, we need to uncompress this file with the command unzip
.
unzip shell-data2.zip
However, instead of uncompressing the file, we get the following error and help messages.
Command 'unzip' not found, but can be installed with:
sudo apt install unzip
Unfortunately, this tells us that the command we want to use is not installed. Fortunately, it tells us how to install it. Let's try it.
sudo apt install unzip
Once the program finishes installing, we can now unzip our files. Remember, you can use the up arrow to scroll through your history to a previous command, then select enter to run it.
unzip shell-data2.zip
After the files are inflated, you can remove .zip file if you wish with the rm
command.
rm shell-data2.zip
Now, when we type ls
we see 6 directories and a README.md file.
ls
MiSeq README.md binder books images seattle southpark
Now that we have these files on a cloud computer, we can run bioinformatic programs that are typically too computationally expensive to run on our local computer.
FastQC is a bioinformatic program performs quality control checks on raw sequence data coming from high throughput sequencing pipelines. It runs a set of analyses to help you identify problems in the quality of your samples or sequence. The output of fastqc is an HTML document.
Here are some links if you would like to learn more about FASTQ on your own time.
- Analysis Modules Documentation
- What a good data file looks like
- What bad data looks like
- Video: FastQC tool for read data quality evaluation
- Video: Using FastQC to check the quality of high throughput sequence
To install FASTQC, we first need to update the apt
package. Then we can use it to install the program. Add the -y
option to say "yes" to all the installing prompts automatically.
sudo apt update
sudo apt install fastqc -y
To double check it was successful, type fastqc --version
. If it returns 0.11.9, that means installation was successful. You can also type fastqc --help
to view the manual.
A fastqc command looks like this: fasqtc -o <output directory> <file>
The output directory must exist! Before we begin, let's navigate to the MiSeq
directory with cd
and create a results
directory with a subdirectory called fastqc
. Remember, the -p
argument will tell mkdir
to create any missing parent directories.
cd MiSeq
mkdir -p results/fastqc
ls
We now see the following files and directories.
F3D0_S188_L001_R1_001.fastq F3D148_S214_L001_R1_001.fastq F3D7_S195_L001_R1_001.fastq
F3D0_S188_L001_R2_001.fastq F3D148_S214_L001_R2_001.fastq F3D7_S195_L001_R2_001.fastq
F3D141_S207_L001_R1_001.fastq F3D149_S215_L001_R1_001.fastq F3D8_S196_L001_R1_001.fastq
F3D141_S207_L001_R2_001.fastq F3D149_S215_L001_R2_001.fastq F3D8_S196_L001_R2_001.fastq
F3D142_S208_L001_R1_001.fastq F3D150_S216_L001_R1_001.fastq F3D9_S197_L001_R1_001.fastq
F3D142_S208_L001_R2_001.fastq F3D150_S216_L001_R2_001.fastq F3D9_S197_L001_R2_001.fastq
F3D143_S209_L001_R1_001.fastq F3D1_S189_L001_R1_001.fastq HMP_MOCK.v35.fasta
F3D143_S209_L001_R2_001.fastq F3D1_S189_L001_R2_001.fastq Mock_S280_L001_R1_001.fastq
F3D144_S210_L001_R1_001.fastq F3D2_S190_L001_R1_001.fastq Mock_S280_L001_R2_001.fastq
F3D144_S210_L001_R2_001.fastq F3D2_S190_L001_R2_001.fastq README.md
F3D145_S211_L001_R1_001.fastq F3D3_S191_L001_R1_001.fastq mouse.dpw.metadata
F3D145_S211_L001_R2_001.fastq F3D3_S191_L001_R2_001.fastq mouse.time.design
F3D146_S212_L001_R1_001.fastq F3D5_S193_L001_R1_001.fastq results
F3D146_S212_L001_R2_001.fastq F3D5_S193_L001_R2_001.fastq stability.batch
F3D147_S213_L001_R1_001.fastq F3D6_S194_L001_R1_001.fastq stability.files
F3D147_S213_L001_R2_001.fastq F3D6_S194_L001_R2_001.fastq
We can run fastq on one file at a time or on all of them at once. Both of the following commands work.
fastqc -o results/fastqc F3D0_S188_L001_R1_001.fastq
fastqc -o results/fastqc *fastq
The standard output looks like this:
Started analysis of F3D0_S188_L001_R1_001.fastq
Approx 10% complete for F3D0_S188_L001_R1_001.fastq
Approx 25% complete for F3D0_S188_L001_R1_001.fastq
Approx 35% complete for F3D0_S188_L001_R1_001.fastq
Approx 50% complete for F3D0_S188_L001_R1_001.fastq
Approx 60% complete for F3D0_S188_L001_R1_001.fastq
Approx 75% complete for F3D0_S188_L001_R1_001.fastq
Approx 85% complete for F3D0_S188_L001_R1_001.fastq
Now, we can navigate to our results directory to view the results.
cd results/fastqc
ls
For every input, there are two outputs: an html file and a ziped folder. The html files are of interest.
F3D0_S188_L001_R1_001_fastqc.html F3D150_S216_L001_R1_001_fastqc.html
F3D0_S188_L001_R1_001_fastqc.zip F3D150_S216_L001_R1_001_fastqc.zip
F3D0_S188_L001_R2_001_fastqc.html F3D150_S216_L001_R2_001_fastqc.html
F3D0_S188_L001_R2_001_fastqc.zip F3D150_S216_L001_R2_001_fastqc.zip
F3D141_S207_L001_R1_001_fastqc.html F3D1_S189_L001_R1_001_fastqc.html
F3D141_S207_L001_R1_001_fastqc.zip F3D1_S189_L001_R1_001_fastqc.zip
F3D141_S207_L001_R2_001_fastqc.html F3D1_S189_L001_R2_001_fastqc.html
F3D141_S207_L001_R2_001_fastqc.zip F3D1_S189_L001_R2_001_fastqc.zip
F3D142_S208_L001_R1_001_fastqc.html F3D2_S190_L001_R1_001_fastqc.html
F3D142_S208_L001_R1_001_fastqc.zip F3D2_S190_L001_R1_001_fastqc.zip
F3D142_S208_L001_R2_001_fastqc.html F3D2_S190_L001_R2_001_fastqc.html
Click the raised hand ✋ if you have html files in a results directory.
Congratulations, you have successfully, launched and connected to an instance, navigated the file system, downloaded data, installed programs, and executed programs at the command line. Here's an overview of some of the commands we used.
:::info
Command | Description |
---|---|
pwd |
print name of current/working directory |
ls [options] [path] |
list directory contents |
cd [path] |
change the working directory |
curl -O [URL] | download a file from a URL and save it using the original file name |
sudo apt install [program] |
install a program |
unzip [filename] |
uncompress filename |
rm [path] | removes (deletes) a file |
mkdir -p [path/to/files] | creates a hierarchy of directories |
::: |
✔️ Let's take a 3 min break before moving on to the next section.
After processing your data in the cloud, you most likely need to copy some of your files to your local computer for viewing and sharing. In this section, we will use our ssh keys and public Domain Name System (DNS) to securely copy files from the cloud to our local computer using either a secure shell (ssh
) or secure copy (scp
).
If you have a Windows machine, you will need to download a Terminal program. We recommend MobaXterm which is both a Terminal and an SSH client.
Read the following steps and/or watch this short video tutorial.
MobaXterm installation
- Go to the MobaXterm website to download
- Click on "GET MOBAXTERM NOW!"
- The Home Edition works great and is free. Click "Download now".
- Click on "MobaXterm Home Edition v20.6 (Portable edition)" and save as in your Downloads folder.
- Go to your Downloads folder, click on the zipped folder, click "Extract all", click "Extract"
- The MobaXterm application is now in the unzipped folder
- Click on the MobaXterm application to open it!
Now that you have MobaXTerm installed you need to find the name and the address of your instance. To do so, let's reconnect to our instances.
- In a new browser tab or winder, navigate to the instances page.
- Check the empty box next to your instance.
- Click the "Connect" button.
- Click the SSH client tab.
- Find the "Example:" ssh command. Copy the last piece of information, which contains the public DNS for your instance and the computer name. It will look something like "ec2-54-193-121-227.us-west-1.compute.amazonaws.com"
- In MobaXterm, click on "Session"
- Click on "SSH"
- Enter the Public DNS as the "Remote host"
- Check the box next to "Specify username" and enter "ubuntu" as the username
- Click the "Advanced SSH settings" tab
- Check box by "Use private key"
- Use the document icon to navigate to where you saved the private key (e.g., "amazon.pem") from AWS on your computer. It is likely on your Desktop or Downloads folder
- Click "OK"
- A terminal session should open up with a left-side panel showing the file system of our AWS instance!
- Click on one of the FastQC html files to view it in a browser.
Click the raised hand ✋ in zoom once you have viewed opened an html file.
Mac users do not need to install any additional programs to transfer files. You do however need to locate the ssh key file you saved at the beginning of the workshop.
- Open a Terminal window
- Navigate your private key file and change the permissions using
chmod 400
to ensure your key is not publicly viewable. Note: your .pem file may be in a different directory and have a different name. Modify the following commands accordingly.
cd ~/Desktop/
chmod 400 aws-jan-2022.pem
- In a new browser tab or window, navigate to the instances page
- Check the empty box next to your instance.
- Click the "Connect" button.
- Click the SSH client tab.
- Find the "Example:" ssh command. Copy the last piece of information, which contains the public DNS for your instance and the computer name. It will look something like "[email protected]"
- Use the
scp
command on your local terminal to copy all the.html
files. The-i
option is used to specify the ssh key file. As with the copy (cp
) command, you must specify both the location of the source file and the location of the copied file. When specifying the source file, you must first include the Public DNS link (ec2-.....amazon.com) and the name of the user (@ubuntu). To specify the path to the file, add a:
after the DNS and the paste the path to the file.
Your command will look something like this. Remember to use your .pem file and your DNS. You can specify the current directory on your local computer with .
scp -i keys.pem [email protected]:~/MiSeq/results/fastqc/F3D141_S207_L001_R1_001_fastqc.html .
If this is your first time connecting to an instance, you may be prompted with the following question" Are you sure you want to continue connecting (yes/no/[fingerprint])?". Type "yes".
If you want to copy all the html files, you will need to put the path the files in single quotes to escape the wildcard.
scp -i keys.pem '[email protected]:~/MiSeq/results/fastqc/*fastqc.html' .
Click the raised hand ✋ in zoom once you have viewed opened an html file.
Congratulations! You have now successfully downloaded files from the cloud to your local computer.
:::info
Command | Description |
---|---|
ssh -i keys.pem user@publicDNS |
connect to secure shell with ssh keys |
scp -i keys.pem user@publicDNS:~/path/to/directory/file . |
download files with ssh keys |
scp -i keys.pem file user@publicDNS:~/path/to/directory |
upload files with ssh keys |
::: |
The AWS Free Tier of services only remains free if you stay within the usage limits. If your instance is running in the cloud, you may be charged even if you aren't using it for computer power or storage. It is therefore good practice to shut down your instances when not in use.
There are three options for shutting down instances.
-
Stopping:
- saves data to EBS root volume
- only EBS data storage charges apply
- No data transfer charges or instance usage charges
- RAM contents not stored
-
Hibernation:
- charged for storage of any EBS volumes
- stores the RAM contents
- it's like closing the lid of your laptop
-
Termination:
- complete shutdown
- EBS volume is detached
- data stored in EBS root volume is lost forever
- instance cannot be relaunched
These accounts will remain available for 24 hours before your instructor deletes them. If you wish to return to your instance within the next 24 hours, stopping it is a good idea. If you are done practicing, terminating the instance is the best idea.
To shut down an instance:
- Navigate to the instances page
- Check the empty box next to your instance
- Click the "Instance state" button
- Select "Stop instance" or "Terminate instance" as appropriate
Launch a t2.nano, Ubuntu 20.04 LTS - Focal instance in the East US (Ohio) region. Change the root storage volume to 16 GiB and add an additional EBS volume (8 GiB).
Hint
- Go to Amazon Marketplace and search for the "Ubuntu 20.04 LTS - Focal". Should be the first result.
- Look in tab 4 called "Add Storage" to add additional storage volumes.
In today's workshop, we covered the following topics:
- AWS terminology and login
- How to launch an instance
- How to connect to the instance
- How to install and run a software programs on the instance
- How to terminate your instance
We hope this workshop was helpful. Please complete the post-workshop survey to let us know if the workshop was useful or if you have any suggestions for improvement.
We will send a follow email with links to the resources used today.
If you have any questions, comments, or concerns, feel free to contact us at [email protected]
Check our Events page for information on upcoming workshops!
- Home
- Resources for Attendees
- Resources for Instructors
- Training Workshop Notes
-
HuBMAP Tools
-
R
-
RNA-Seq Concepts, Design and Workflows
-
RNA-Seq in the Cloud
-
Snakemake Part I & II
-
UNIX