Table of Contents
This repository contains a standalone Python script designed to extract information from Illumina binary QC files and convert to a YAML file. The script serves as a refactored replacement for illuminate module and uses InterOp module, which was subsequently integrated into the whole-exome sequencing (WES) pipeline during my tenure at UCSF.
conda can be used to install interop
and pandas
.
- Clone the repo
git clone https://github.com/sfpacman/Read_InterOp_illumina/
- Install packages via conda
conda install bioconda::illumina-interop conda install pandas
You are now ready to run the script!
Execute the Python script in the terminal:
python run_qc_yaml_interop_production.py <target_dir> <out_dir>
Provide a directory containing RunInfo.xml
and an InterOp
subdirectory containing Illumina binary files
A yaml file contains the following QC metrics:
- lane_level_metrics
- xread_level_metrics
- read_level_metrics
- read_yield_metrics
- sample_level_metrics
- run_level_metrics
No additional arguments are included for modifying Illumina QC column names and metric conversion for the final report as the format is strictly defined. However, you can simply change the implenetation for the following functions.
get_columns_name()
get_metrics()
Consider implementing a YAML configuration parsing function in the future for enhanced flexibility.