Deep mutational scanning of the CGGnaive scFv

Analysis of deep mutational scanning of barcoded codon variants of CGGnaive antibody.

Study and analysis by Tyler Starr, Jesse Bloom, and co-authors.

Code refactored to integrate improved Kd modeling by Jared Galloway, and Will DeWitt.

Summary of workflow and results

For a summary of the workflow and links to key results files, click here. Reading this summary is the best way to understand the analysis.

Running the analysis on a single machine (recommended)

Note the sequencing data is not included in this repository and thus the processing of the ccs and the variants counts step are not able to be run from the data. We track the output files for these steps, and thus you can run the pipeline to completion downstream of these steps.

We let snakemake conda functionality handle the environment setup, so all you need is an environment with snakemake and git-lfs installed. We reccomend following these instructions first, then running the following commands:

conda activate snakemake
conda install git-lfs
git clone https://github.com/jbloomlab/Ab-CGGnaive_DMS.git
cd Ab-CGGnaive_DMS
git lfs install

Then you can run the pipeline with the following command:

snakemake -j8 --use-conda

Configure `.git` to not track Jupyter notebook metadata

To simplify git tracking of Jupyter notebooks, we have added the filter described here to strip notebook metadata to .gitattributes and .gitconfig. The first time you check out this repo, run the following command to use this configuration (see here):

   git config --local include.path ../.gitconfig

Then don't worry about it anymore.

Configuring the analysis

The configuration for the analysis is specifed in config.yaml. This file defines key variables for the analysis, and should be relatively self-explanatory. You should modify the analysis by changing this configuration file; do not hard-code crucial experiment-specific variables within the Jupyter notebooks or Snakefile.

The input files pointed to by config.yaml are in the ./data/ subdirectory. See the ./data/README.md file for details.

Cluster configuration (untested since #13)

To run using the cluster configuration for the Fred Hutch server, simply run the bash script run_Hutch_cluster.bash, which executes Snakefile in a way that takes advantage of the Hutch server resources. This bash script also automates the environment building steps above, so really all you have to do is run this script. You likely want to submit run_Hutch_cluster.bash itself to the cluster (since it takes a while to run) with:

sbatch -t 7-0 run_Hutch_cluster.bash

There is a cluster configuration file cluster.yaml that configures Snakefile for the Fred Hutch cluster, as recommended by the Snakemake documentation. The run_Hutch_cluster.bash script uses this configuration to run Snakefile. If you are using a different cluster than the Fred Hutch one, you may need to modify the cluster configuration file.

Notebooks that perform the analysis

The Jupyter notebooks and R markdown dscripts that perform most of the analysis are in this top-level directory with the extension *.ipynb or *.Rmd. These notebooks read the key configuration values from config.yaml.

There is also a ./scripts/ subdirectory with related scripts.

The notebooks need to be run in the order described in the workflow and results summary. This will occur automatically if you run them via Snakefile as described above.

Results

Results are placed in the ./results/ subdirectory. Many of the files created in this subdirectory are not tracked in the git repo as they are very large. However, key results files are tracked as well as a summary that shows the code and results. Click here to see that summary.

The large results files are tracked via git-lfs. This requires git-lfs to be installed, which it is in the conda environment specified by environment.yml. The following commands were then run:

git lfs install

You may need to run this if you are tracking these files and haven't installed git-lfs in your user account. Then the large results files were added for tracking with:

git lfs track results/variants/codon_variant_table.csv
git lfs track results/counts/variant_counts.csv
git lfs track results/binding_Kd/bc_binding.csv
git lfs track results/final_variant_scores/final_variant_scores.csv

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
data		data
docs		docs
envs		envs
exptl_data		exptl_data
results		results
scripts		scripts
.gitattributes		.gitattributes
.gitconfig		.gitconfig
.gitignore		.gitignore
Heatmaps-Interactive-Visualization.ipynb		Heatmaps-Interactive-Visualization.ipynb
README.md		README.md
Relationship-Between-Metrics.ipynb		Relationship-Between-Metrics.ipynb
Snakefile		Snakefile
Titeseq-modeling.ipynb		Titeseq-modeling.ipynb
analyze_counts.ipynb		analyze_counts.ipynb
cluster.yaml		cluster.yaml
collapse_scores.Rmd		collapse_scores.Rmd
compute_binding_Kd.Rmd		compute_binding_Kd.Rmd
compute_expression_meanF.Rmd		compute_expression_meanF.Rmd
config.yaml		config.yaml
count_variants.ipynb		count_variants.ipynb
dag.png		dag.png
environment.yml		environment.yml
normalize-filter-aggregate-barcodes.ipynb		normalize-filter-aggregate-barcodes.ipynb
process_ccs.ipynb		process_ccs.ipynb
run_Hutch_cluster.bash		run_Hutch_cluster.bash
structural_mapping.Rmd		structural_mapping.Rmd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep mutational scanning of the CGGnaive scFv

Summary of workflow and results

Running the analysis on a single machine (recommended)

Configure `.git` to not track Jupyter notebook metadata

Configuring the analysis

Cluster configuration (untested since #13)

Notebooks that perform the analysis

Results

About

Releases 1

Packages

Contributors 4

Languages

jbloomlab/Ab-CGGnaive_DMS

Folders and files

Latest commit

History

Repository files navigation

Deep mutational scanning of the CGGnaive scFv

Summary of workflow and results

Running the analysis on a single machine (recommended)

Configure .git to not track Jupyter notebook metadata

Configuring the analysis

Cluster configuration (untested since #13)

Notebooks that perform the analysis

Results

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 4

Languages

Configure `.git` to not track Jupyter notebook metadata

Packages