- Overview
- Background
- Before Starting
- Getting Started
- Software Requirements
- Architecture Design
- Data
- Funding
- License for Data
- Wrapping Up
This module will introduce you to (graphical) pangenomics and walk you through a pangenomics pipeline. Specifically, you will learn how to build a pangenome graph, map reads to the graph, call variants on the mapped reads, and visualize the graph. All analyses will be performed on the Google Cloud Platform. The estimated cost for the complete module is $?
A pangenome is a collection of genomes from the same species. Compared to a reference genome, a pangenome is a less biased, more comprehensive representation of sequence preservation and variation within a population. While the pangenome may provide greater insight into questions related to the genetic and genomic nature of a species, these data require the use of bioinformatics tools that are different than those typically used on reference genomes. This module aims to introduce you to the idea of pangenome graphs and the bioinformatics tools used for their analysis.
This module is designed to run on the Google Cloud Platform (GCP). Follow the instructions below to prepare to run the module on GCP.
Setting up GCP
See the Vertex AI Quickstart instructions for details on steps 1-5.
- Create a Google Cloud account
- Create a Google Cloud project
- Enable billing for your Google Cloud project
- Go to Vertex AI Workbench and create a new instance using "CREATE NEW" -> "ADVANCED OPTIONS" and use the following configurations:
- Details:
Region: us-east4
Zone: us-east4-a
Workbench type:
Type: Instance - Environment:
JupyterLab Version: JupyterLab 4.x - Machine type:
Series: N2
Machine type: n2-standard-4
Idle shutdown:
Enable Idle Shutdown: Checked
Time of inactivity before shutdown (Minutes): 30 - Disks: Use default settings
- Networking:
Assign external IP address: Checked
Allow proxy access: Checked - IAM and security
Security options:
Root access to the instance: Checked
Terminal access: Checked - System health: Use default settings
- Details:
- Click "OPEN JUPYTERLAB" on your instance to open JupyterLab
Installing Software
To install the software for this module in JupyterLab, open a Terminal (File -> New Launcher -> Terminal) and run the following commands:
cd ~
git clone https://github.com/ncgr/NIGMS-Sandbox-Pangenomics-Module.git
bash -i ./NIGMS-Sandbox-Pangenomics-Module/scripts/0-setup.sh
After the last command completes, close the terminal and restart the instance in the Vertex AI Workbench.
There should now be a new kernal in the JupyterLab launcher called "nigms-pangenomics". This is the kernel you should use with every notebook in the module. The launcher should also contain two new sections: "Submodule Notebooks" and "Visualization Software". Submodule notebooks contains an ordered list of the notebooks in this module, one for each submodule. Clicking on a submodule will open the corresponding notebook. Visualization Software contains a list of visualization software used in this module. Clicking on a program in this list will open the program in a new window in your Web Browser.
After following the Before Starting instructions, the JupyterLab launcher (File -> New Launcher) will contain a "Submodule Notebooks" section. This section contains an ordered list of the notebooks in this module, one for each submodule. Clicking on a submodule in this section will open the corresponding notebook. To begin, click on the "Environment Setup."
Alternatively, you can use the JupyterLab file browser. Here is the location and file structure of the module notebooks:
NIGMIS-Sandbox-Pangenomics-Module/
└── module_notebooks/
├── 00-environment-setup.ipynb
├── 01-intro-to-pangenomics.ipynb
├── 02-building-graphs-with-pggb.ipynb
├── 03-indexing-graphs-with-vg.ipynb
├── 04-read-mapping-with-vg.ipynb
├── 05-variant-calling-with-vg.ipynb
├── 06-searching-graphs-with-blast.ipynb
└── 07-visualization.ipynb
module_notebooks/
contains Jupyter notebooks - one for each submodule.
To open a notebook, simply double-click on it it.
To begin this module, open the 00-environment-setup.ipynb
notebook.
The following software is required for this module:
All of these programs can be installed in JupterLab running on the GCP Vertex AI Workbench following the Installing Software instructions in the Before Starting section.