This pipeline is designed to analyze CRISPR data from Next Generation Sequencing (NGS) experiments. It is designed to be run on both high performance computing cluster or personal computer. The pipeline is written in Python and uses Snakemake to manage the workflow. [In progress]
- PI: Dr. Arnaud Augert
- PhD Student: Danny Gallant
- Postgraduate: Juan M. Martinez-Villalobos
- Snakemake 5.10.0 or higher
- Python 3.6 or higher
- R 3.6 or higher
- Clone the repository
git clone https://github.com/martinezvbs/CRISPR.git
- Construct a sample sheet with each line corresponding to a separate barcode.
- Run the pipeline
python python3 count_barcodes.py -i CRISPR_library.csv -f ORF_Library_R1_001.fastq -o File.csv -no-g
The pipeline will generate the following files:
File.csv
- A CSV file containing- Unique barcode name
- Unique barcode sequence
- Counts of each barcode
- Gene length of the ORF
- RefSeq ID of the ORF
statistics_file.txt
- A TXT file containing following statistics:- Total number of reads
- Number of perfect barcode matches:
- Number of nonperfect barcode matches:
- Number of reads processed:
- Percentage of barcodes that matched perfectly:
- Percentage of undetected barcodes:
- Skew ratio of top 10% to bottom 10%:
CRISPR-scatter.tiff
- A scatter plot of the CRISPR dataCRISPR-perfect-matches.csv
- A CSV file containing the perfect matchesCRISPR-nonperfect-matches.csv
- A CSV file containing the nonperfect matches
For questions or comments, please contact Juan M. Martinez-Villalobos
Part of the code was adapted from Joung, J., Konermann, S., Gootenberg, J. et al. Genome-scale CRISPR-Cas9 knockout and transcriptional activation screening. Nat Protoc 12, 828–863 (2017). https://doi.org/10.1038/nprot.2017.016