Name	Name	Last commit message	Last commit date
parent directory ..
05.0 - Variant Calling - SNP	05.0 - Variant Calling - SNP
06.0 - Variant Calling - INDEL	06.0 - Variant Calling - INDEL
README.md	README.md

Whole Genome Sequencing Pipeline

This pipeline explaines the processing of whole genome sequecing raw data from matched normal-tumor cancer samples generated by next generation sequencing. The pipeline of processing whole genome is similar to whole exome, except that variants calling will be from both exonic and intronic parts. All septs performed in WGS is similar to WES, with changes only in steps 05.0 - Variant Calling - SNP and 06.0 - Variant Calling - INDEL.

Pipeline Workflow

The Pipeline steps

01.0 Link/Concatenate FASTQ files

The first step in the pipeline is to create renamed links or concatenated FASTQ files

02.0 Quality control

This step is done to check the raw data quality before start processing using fastq. The input in for this step is all fastq files available in FASTQ folder.

03.0 Alignment

This step is to align the read to the refrence genome hs37d5 using BWA aligner and to trimm the adaptor sequences as well. This step is done in the main project folder and uses all fastq files available in FASTQ folder. The output is .bam files of aligned read and will created in BAMS folder.

04.0 QC Post Alignment

Post alignment quality control is done in BAMS folder and use the alligned reads generated from the previous step. There are three modules used to check post alignment quality:
- A. Conpair: To check normal-tumor samples concordance
- B. Targeted Panel: To check the coverage of the targeted regions (exonic regions).
- C. Mosdepth: To get the coverage and plot proportion of bases at coverage.
  The output of each module will be generated in new folder with corresponding name inside QC folder.

05.0 Variant Calling - SNP

Variants calling of SNP using mutect. The input for this step is the normal and tumor bam files generated from step 2 and the output will be vcf files created in mutect folder.

05.1 Filter PASSED
- Filter vcf file to only selected variants that are marked with PASS in the filter column in the vcf file. The input file is vcf files generated from the previous step and the output will be a PASSED.vcf files generated in PASS folder.
05.2 FP Filter
- False positive filter applied to PASSED.vcf files generated from the previous step to filter out false positive variants. The output files will be PASSED_filter.vcf created in a new folder called Filter/filterVcf.
05.3 Vcf to MAF
- Converting Vcf to MAF files using tools like VEP that determines the effect of variants on genes, transcripts, and protein sequence (using SIFT), as well as regulatory regions. The input for this step is all PASSED_filter.vcf files created from the previous step and the output will be MAF files created in MAF folder.
05.4 Merging MAF files
- This step is done to merge MAF files from each sample into one MAF file and to create a seperate text file with the column names of MAF file.
Download the output files (samples.maf and head.txt files) to a local directory, and merge the two files into one final MuTect maf file using R script

06.0 Variant Calling - INDEL

Variant calling of INDEL using strelka2. This step is done in the main project folder and the input are bam files from BAM folder generated from step 2. The output will be a vcf files created in work/strelka2 directory.

06.1 Filter PASSED
- Filter vcf file to only selected variants that are marked with PASS in the filter column in the vcf file. The input file is vcf files generated from the previous step and the output will be a PASSED.vcf files generated in PASS folder.
06.2 Vcf to MAF
- Converting Vcf2MAF files using tools like VEP that determines the effect of variants on genes, transcripts, and protein sequence (using SIFT), as well as regulatory regions. The input for this step is all PASSED.vcf files created from the previous step and the output will be MAF files created in MAF folder.
06.3 Merging MAF files
- This step is done to merge MAF files from each sample into one MAF file and to create a seperate text file with the column names of MAF file.
Download the output files (strelka2_all_samples.maf and header_strelka2_all_samples.maf files) to a local directory, and merge the two files into one final strealka2 maf file using R script
Further processing of MuTect and Strealka2 MAF files is done in R to filter out SNP and low complexity variants from strealka2 MAF file, combine MuTect and Strelka2 MAF files into one final MAF file, and to filter most deleterious variants from the final MAF file.

Output Folders Structure

Labname/Project

FASTQ: Raw data (fastq files)
QC: Quality conrol of fasq files
BAMS
mutect
- Results
  - PASSED
    - MAF
    - Filter
strelka2
- config
- final
- work

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WGS

WGS

README.md

Whole Genome Sequencing Pipeline

Pipeline Workflow

The Pipeline steps

01.0 Link/Concatenate FASTQ files

02.0 Quality control

03.0 Alignment

04.0 QC Post Alignment

05.0 Variant Calling - SNP

05.1 Filter PASSED

05.2 FP Filter

05.3 Vcf to MAF

05.4 Merging MAF files

06.0 Variant Calling - INDEL

06.1 Filter PASSED

06.2 Vcf to MAF

06.3 Merging MAF files

Output Folders Structure

Files

WGS

Directory actions

More options

Directory actions

More options

Latest commit

History

WGS

Folders and files

parent directory

README.md

Whole Genome Sequencing Pipeline

Pipeline Workflow

The Pipeline steps

01.0 Link/Concatenate FASTQ files

02.0 Quality control

03.0 Alignment

04.0 QC Post Alignment

05.0 Variant Calling - SNP

05.1 Filter PASSED

05.2 FP Filter

05.3 Vcf to MAF

05.4 Merging MAF files

06.0 Variant Calling - INDEL

06.1 Filter PASSED

06.2 Vcf to MAF

06.3 Merging MAF files

Output Folders Structure