VNtyper 2.0 - A Pipeline to genotype the MUC1-VNTR

VNtyper 2.0 is an advanced pipeline designed to genotype MUC1 coding Variable Number Tandem Repeats (VNTR) in Autosomal Dominant Tubulointerstitial Kidney Disease (ADTKD-MUC1) using Short-Read Sequencing (SRS) data. This version integrates enhanced variant calling algorithms, robust logging mechanisms, and streamlined installation processes to provide researchers with a powerful tool for VNTR analysis.

We have developed a web server to provide free access to VNtyper, which runs in the background for ease of use. Access it through the following link: vntyper-online

Features

Variant Calling Algorithms:
- Kestrel: Mapping-free genotyping using k-mer frequencies.
- code-adVNTR (optional): Profile-HMM based method for VNTR genotyping.
Comprehensive Logging:
- Logs both to the console and a dedicated log file.
- Generates MD5 checksums for all downloaded and processed files.
Flexible Installation:
- Supports installation via pip using setup.py.
- Provides Conda environment setup for easy dependency management.
Subcommands:
- install-references
- pipeline
- fastq
- bam
- kestrel
- report
- cohort

Installation

VNtyper 2.0 can be installed using either pip with setup.py or via Conda environments for streamlined dependency management.

Using `setup.py` and `pip`

Clone the Repository:

mkdir vntyper
git clone https://github.com/hassansaei/vntyper.git
cd vntyper
pip install .

Usage

VNtyper 2.0 offers multiple subcommands that can be used depending on your input data and requirements. Below are the main subcommands available:

1. Running the Full Pipeline

To run the entire pipeline on paired-end FASTQ files or BAM files:

vntyper pipeline \
    --config-path /path/to/config.json \
    --fastq1 /path/to/sample_R1.fastq.gz \
    --fastq2 /path/to/sample_R2.fastq.gz \
    --output-dir /path/to/output/dir \
    --threads 4

Alternatively, using a BAM file:

vntyper pipeline \
    --config-path /path/to/config.json \
    --bam /path/to/sample.bam \
    --output-dir /path/to/output/dir \
    --threads 4

2. Installing References

vntyper install-references \
    --output-dir /path/to/reference/install \
    --config-path /path/to/config.json \
    --skip-indexing  # Optional: skip BWA indexing if needed

3. Generating Reports

Generate a summary report for your VNTR genotyping analysis:

vntyper report \
    --output-dir /path/to/output/dir \
    --config-path /path/to/config.json

Process raw FASTQ files to prepare them for genotyping:

vntyper fastq \
    --fastq1 /path/to/sample_R1.fastq.gz \
    --fastq2 /path/to/sample_R2.fastq.gz \
    --output-dir /path/to/output/dir

vntyper bam \
    --alignment /path/to/sample.bam \
    --output-dir /path/to/output/dir \
    --threads 4

Pipeline Overview

VNtyper 2.0 integrates multiple steps into a streamlined pipeline. The following is an overview of the steps involved:

FASTQ Quality Control: Raw FASTQ files are checked for quality.
Alignment: Reads are aligned using BWA (if FASTQ files are provided).
Kestrel Genotyping: Mapping-free genotyping of VNTRs.
(Optional) adVNTR Genotyping: Profile-HMM based method for VNTR genotyping (requires additional setup).
Summary Report Generation: A final HTML report is generated to summarize the results.

Dependencies

VNtyper 2.0 relies on several tools and Python libraries. Ensure that the following dependencies are available in your environment:

Python >= 3.9
BWA
Samtools
Fastp
Pandas
Numpy
Biopython
Pysam
Jinja2
Matplotlib
Seaborn
IGV-Reports

You can easily set up these dependencies via the provided Conda environment file.

Pipeline Logic Diagram

Below is a logical overview of the VNtyper pipeline:

graph TD
  A[Input: FASTQ/BAM] -->|Quality Control| B[Alignment BWA]
  B -->|Genotyping| C[Kestrel]
  C --> D[Optional: adVNTR]
  D --> E[Generate Summary Report]
  E --> F[Output: VCF, Summary HTML]

Loading

Notes

This tool is for research use only.
Ensure high-coverage WES data is used to genotype MUC1 VNTR accurately.
For questions or issues, refer to the GitHub repository for support.

If you use VNtyper 2.0 in your research, please cite the following:

Saei H, Morinière V, Heidet L, et al. VNtyper enables accurate alignment-free genotyping of MUC1 coding VNTR using short-read sequencing data. iScience. 2023.
Audano PA, Ravishankar S, et al. Mapping-free variant calling using haplotype reconstruction from k-mer frequencies. Bioinformatics. 2018.
Park J, Bakhtiari M, et al. Detecting tandem repeat variants in coding regions using code-adVNTR. iScience. 2022.

Contributing

We welcome contributions to VNtyper. Please refer to the CONTRIBUTING.md file for guidelines.

License

VNtyper is licensed under the BSD 3-Clause License. See the LICENSE file for more details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

VNtyper 2.0 - A Pipeline to genotype the MUC1-VNTR

Table of Contents

Features

Installation

Using `setup.py` and `pip`

Usage

1. Running the Full Pipeline

2. Installing References

3. Generating Reports

Pipeline Overview

Dependencies

Pipeline Logic Diagram

Notes

Contributing

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

VNtyper 2.0 - A Pipeline to genotype the MUC1-VNTR

Table of Contents

Features

Installation

Using setup.py and pip

Usage

1. Running the Full Pipeline

2. Installing References

3. Generating Reports

Pipeline Overview

Dependencies

Pipeline Logic Diagram

Notes

Contributing

License

Using `setup.py` and `pip`