Summary of changes:
- Updates to better handle targeted data
- Filter reads on rq (>=0.99), if rq is present in input bam
- Add a
--targeted
option for targeted data to drop the assumption of uniform coverage across the genome - Add two optional parameters for targeted data
--min-read-variant
: Partially controls the number of supporting reads for a variant for identifying variants used for phasing. The cutoff for variant-supporting reads is determined by min(this number, max(5, depth*0.11)). Default is 20. At standard WGS depth, the default value is overwritten by max(5, depth*0.11).- Use cases: 1) Set this number low for low-coverage data or to increase sensitivity. 2) For targeted data with high coverage, set this number relatively high to avoid picking up sequencing errors and to reduce run time.
--min-read-haplotype
: Minimum number of unique supporting reads for a haplotype. Default is 4. For targeted data with high coverage, this cutoff can be increased to reduce errors and to reduce run time.
- Updates to target regions:
- Update coordinates of some target regions to include full genes whenever possible:
pms2,ikbkg,hba,DDT,MBD3L2,DEFA1,PRY,CHRNA7,DHX40,GOLGA8A,IQCK,NXF2,OTOA,PDPK1,POTEI,RGPD1,RGPD3,RSPH10B,SIK1,TMLHE,CBS,KCNE1,CASTOR2,NBPF4,RGPD5,GOLGA8N,POTEB,ANKRD20A1,NSF
- Add TNXB as a region on its own so that the full gene can be genotyped (the RCCX region only includes part of TNXB)
- Algorithmic changes
- Improve fusion calling in cases of homozygous deletion
- Add some homozygous sites to cover target regions evenly during phasing to improve read assignment to haplotypes and variant calling
- Update a few gene-specific callers
hba
: Add calling of 4.2 deletion/duplicationsmn1
: If homozygous throughout region, default to CN =2 instead of 1; Drop carrier call if only one SMN1 haplotype is found but the total CN of SERF1A/B (neighboring locus) is larger than the total CN of SMN1/2ikbkg
: Improve calling of the 11.7kb deletion; Update the config to genotype the entire genencf1
: Drop carrier call if only one NCF1 haplotype is found but the total CN of GTF2I (neighboring locus) is larger than the total CN of NCF1 familyrccx
: Better handle homozygous deletion casespms2
: Update the config to genotype the entire gene
- Other changes:
- Support cram as input
- Standardize haplotype naming across regions:
{gene name}_{haplotype name}