Releases: PacificBiosciences/paraphase
Releases · PacificBiosciences/paraphase
Version 3.2.0
Summary of changes:
- Updates to better handle targeted data
- Filter reads on rq (>=0.99), if rq is present in input bam
- Add a
--targeted
option for targeted data to drop the assumption of uniform coverage across the genome - Add two optional parameters for targeted data
--min-read-variant
: Partially controls the number of supporting reads for a variant for identifying variants used for phasing. The cutoff for variant-supporting reads is determined by min(this number, max(5, depth*0.11)). Default is 20. At standard WGS depth, the default value is overwritten by max(5, depth*0.11).- Use cases: 1) Set this number low for low-coverage data or to increase sensitivity. 2) For targeted data with high coverage, set this number relatively high to avoid picking up sequencing errors and to reduce run time.
--min-read-haplotype
: Minimum number of unique supporting reads for a haplotype. Default is 4. For targeted data with high coverage, this cutoff can be increased to reduce errors and to reduce run time.
- Updates to target regions:
- Update coordinates of some target regions to include full genes whenever possible:
pms2,ikbkg,hba,DDT,MBD3L2,DEFA1,PRY,CHRNA7,DHX40,GOLGA8A,IQCK,NXF2,OTOA,PDPK1,POTEI,RGPD1,RGPD3,RSPH10B,SIK1,TMLHE,CBS,KCNE1,CASTOR2,NBPF4,RGPD5,GOLGA8N,POTEB,ANKRD20A1,NSF
- Add TNXB as a region on its own so that the full gene can be genotyped (the RCCX region only includes part of TNXB)
- Algorithmic changes
- Improve fusion calling in cases of homozygous deletion
- Add some homozygous sites to cover target regions evenly during phasing to improve read assignment to haplotypes and variant calling
- Update a few gene-specific callers
hba
: Add calling of 4.2 deletion/duplicationsmn1
: If homozygous throughout region, default to CN =2 instead of 1; Drop carrier call if only one SMN1 haplotype is found but the total CN of SERF1A/B (neighboring locus) is larger than the total CN of SMN1/2ikbkg
: Improve calling of the 11.7kb deletion; Update the config to genotype the entire genencf1
: Drop carrier call if only one NCF1 haplotype is found but the total CN of GTF2I (neighboring locus) is larger than the total CN of NCF1 familyrccx
: Better handle homozygous deletion casespms2
: Update the config to genotype the entire gene
- Other changes:
- Support cram as input
- Standardize haplotype naming across regions:
{gene name}_{haplotype name}
Version 3.1.2
Summary of changes:
- Add
--write-nocalls-in-vcf
option to write no-call sites in the VCF
Version 3.1.1
Summary of changes:
Minor update. Fix program error in low-depth or no-data regions. Completes analysis even when the input is a small bamlet (result is still a no-call).
Version 3.1.0
Summary of changes:
- Improve PMS2/PMS2CL differentiation
- Output protein changes at five potentially pathogenic sites in OPN1LW/OPN1MW
- Update region definitions for some families
- Add a few regions for fusion calling
- CYP2D6, GBA, CYP11B1, the CFH gene cluster
- Improve VCFs. See documentation here
- For each region, all gene copies are now in a single VCF file per sample, reported as sample columns in the VCF.
- Report boundary coordinates and the truncated status of a haplotype in the VCF.
- Report groups of haplotypes on the same chromosome when this information is available.
Version 3.0.0
Summary of changes:
- Added HBA1/HBA2 and OPN1LW/OPN1MW callers
- Added ~150 segmental duplication regions for GRCh38
- Improved gene callers
- F8: Improved calling of Intron22 inversion and Exon1-22 deletion
- NCF1: Improved assignment of genes to NCF1 vs. pseudogenes
- PMS2: Improved assignment of genes to PMS2 vs. pseudogene. Updated the coordinates of the region to phase
- IKBKG: Improved assignment of genes to IKBKG vs. pseudogene. Updated the coordinates of the region to phase
- RCCX: Better calling of a multi-allelic site IVS2-13A/C>G
- CFC1: Updated the coordinates of the region to phase
- For SMN1/STRC/PMS2/IKBKG/NCF1, variants are now called against the gene for gene haplotypes and against the paralog/pseudogene for paralog/pseudogene haplotypes
- Report F8 Intron 22 inversion and Exon1-22 deletion, and IKBKG 11.7kb deletion in VCFs
- Improved homopolymer/simple repeat masking before phasing
- Included filtered calls in VCFs
- Added GRCh37/hg19 support for 11 medically relevant gene families
Version 2.2.3
- Speeds up single sample analysis through multiprocessing by genomic region (-t is enabled)
- Adds program version and command to BAM and VCF headers
- Fixed a bug that may lead to failed analysis in low coverage samples
- Fixed a bug in F8 analysis
Version 2.2.2
- Fix low depth error so that even if one region fails depth check, the other regions will still produce results.
- Show version
Version 2.2.1
This version includes some light updates to Paraphase
- Simplify config files
- Prevent non-zero exit code
- Minor algorithm improvements
Please note that a new input file is required - the reference genome fasta file (specify with -r
)
Version 2.1.0
This release includes some improvements to phasing and variant calling.
- Filter out spurious variants before phasing
- Filter out spurious haplotypes after phasing
- Better handle homozygous cases
Version 2.0.0
This release extends Paraphase to resolve highly homologous genes listed below
- SMN1/SMN2 (spinal muscular atrophy)
- RCCX module
- CYP21A2 (21-Hydroxylase-Deficient Congenital Adrenal Hyperplasia)
- TNXB (Ehlers-Danlos syndrome)
- C4A/C4B (relevant in autoimmune diseases)
- PMS2 (Lynch Syndrome)
- STRC (hereditary hearing loss and deafness)
- IKBKG (Incontinentia Pigmenti)
- NCF1 (chronic granulomatous disease; Williams syndrome)
- NEB (Nemaline myopathy)
- F8 (intron 22 inversion, Hemophilia A)
- CFC1 (heterotaxy syndrome)