- Bumping ENSEMBL versions for GRCh37 and GRCh38.
- Fixing sources information for updated ENSEMBL downloads.
- For ENSEMBL, use ENSEMBL-provided mapping from ENSG to HGNC ID for Entrez ID assignment.
This is necessary as Ensembl gene IDs turn out to be not so stable between hg37 and hg38 after all.
Case in point:
ENSG00000276141
vs.ENSG00000187667
. - Adding
--gene-ids
argument to downloader for creating smaller databases (mostly for test purposes). - Adding SV support to jannovar-cli, includes tests.
- Using ENSEMBL-provided mapping from ENSG to Entrez ID in the case HGNC mapping does not work.
- Adding SV support to jannovar-htsjdk
- Bumping HTSJDK dependency to v2.18.2
- Changing upstream/downstream size to 5kbp.
- Support for prioritizing RefSeq transcript on the PAR of chrX over those of chrY
- Refactorizations to improve performace using
EnumSet
. - Extended
VariantEffect
for the effects of structural variants. Removing documentation that the effect is not used in Jannovar for some now interpreted ones. Also variant effect for non-coding variants is added using the current VEP predictions as a template. - Prohibiting creating
GenomeVariant
with symbolic alleles. Throwing new checked exceptionInvalidGenomeVariant
case of error. - Fixing SO term ID for
VariantEffect.DISRUPTIVE_INFRAME_DELETION
- Correctly parsing transcript version for ENSEMBL when available (not available for b75/GRCh37).
- Making transcript model building (for
download
) more memory efficient.
- Integrating support for thousand genomes VCF
- Integrating thousand genomes/ExAc count limits into inheritance filter
- Adding support for thousand genomes VCF
- Adding support for limiting genomes/ExAc counts into inheritance filter
- Making
OneParentGtFiltered
filter optional. The default setting tofalse
(specify--one-parent-gt-filtered-filters-affected
to enable).
- Moving variants in non-coding transcripts after UTR variants.
- Fixing parser issue for nucleotide indels (#408).
- Obey the
options.escapeAnnField
parameter for escaping the variant effect in theANN
field.
- Changing HTSJDK version to 2.14.3
- Using the one letter amino acid code in HGVS representation as default (changes in core, hgvs, htsjdk and cli). Now the cli option
--3-letter-amino-acids
works as expected.
- Support for RefSeq GRCh37.p13 interim release
- Support of new RefSeq headers
- Using RefSeq GRCh38.p12 annotation instead of GRCh38.p7
- Replacing whitespace with string when annotating from TSV file.
- Fixing bug in GenomeRegionSequenceExtraction. Error reports always sequences from the first contig in the referebnce file and not the requested contig. Affects only the cli command
hgvs-to-vcf
.
- Fixing annotation with Polyphen prediction (data type)
- Changing HTSJDK version to 2.14.0
- Codestyle improvements
- Fixing mendelian "bug" #393 (has no affect because check was not necessary)
- New inheritance mode: mitochondrial
- Bugfix ProgressBar (doPrint was always true)
- Fixed problem with interpretation of Clinvar annotation origin.
- Clinvar
BEST_AC
andBEST_AF
are now namedAC_POPMAX
andAF_POPMAX
to be consitent with gnomAD
- Changing Guava version to 0.22
- Changing slf4j version to 1.7.24
- Changing log4j version to 2.8.2
- Adding experimental support for annotating with VCF files.
- Adding experimental support for annotating with tabix-indexed TSV files and dbNSFP.
- Integrating the advanced pedigree-based filters (useful for filtration to de novo variants).
- Making it possible to override database INI settings using user-specified INI files.
- Fixing stop loss annotation (#351).
- Finishing renaming of TranscriptInfo to TranscriptModel (#348).
- Upstream and downstream variant were considered "not off exome". They now are.
- Adding mitochondrial filtering function (#362).
- Adding code for performing more advanced filtration/annotation filtering to de novo variants.
- Improving documentation of
MaxFreqAr
andMaxFreqAd
in header.
- Adding experimental support for annotating with VCF files
- Adding experimental support for annotating with tabix-indexed TSV files and dbNSFP
- Fixing bug that ignored variant filters for recessive annotation
- Fixin NPE problem with inheritance annotation
- Also counting number of variants on contigs
- Fixing counting bug that made UTR3 variants be counted as UTR5
- Fixing NPE in case of null variant annotations (e.g., unknown contig)
- Fixing a problem with normalization on variant annotation
- Fixing problem with default value of
CLNSIG
("25"
->"255"
)
- Incorporating gnomAD annotation into exclusion by frequency for inheritance filter (#343)
- Fixing header description for
MinAafHomAlt
andMaxAafHomRef
(#342)
- Checking that reference is given also for gnomAD VCF annotation
- Fixing language in mvn surfire plugin. Now mvn tests work on locale de_DE etc..
- Adding
--interval
argument for only processing a part of the file - Adding
statistics
command for computing statistics on variants in VCF file - Fixing bug in HGVS to VCF
- Better handling missing
.dict
file for HGVS to VCF translation - Adding
--annotate-as-singleton-pedigree
parameter for annotation of singleton pedigrees without pedigree file (single individual is assumed to be affected) - More friendly user message in case of unsorted files on inheritance mode annotation
- Interpretation of filters in compatible inheritance mode annotation
- Integrating new jannovar-filter into Jannovar CLI. Filtered genotypes will be passed into the inheritance filter as no-call.
- Adding annotation with ClinVar
- Printing warnings next to the annotations in
annotate-pos
- AR inheritance annotation of two siblings bugfix (no parents avaiable in comp.het mode) #314
- Adding functionality to add filters based on frequencies found in dbSNP and ExAC
- Adding back as module for threshold-based filtration. This module allows to create genotype-wise soft-filters for low coverage. Also, variants can be soft-filtered based on whether the genotype calls of all affected individuals are filtered out.
- Extending API to expose mendelian checks for comp het./ad alt (via
SubModuleOfInheritance
andMendelianInheritanceChecker
- Jannovar version is now written out to database file which allows better error checks and compatibility messages
- Un-deprecating
BestAnnotationListTextGenerator
andAllAnnotationListTextGenerator
classes, useful for text-based output formats - Changing behaviour of
VariantEffect.isOffExome()
and adding a variant that allows to decide between UTR on/off exome and non-consensus splice region on/off exome - Making the behaviour of overriding transcripts configurable at least in the code, using default to not do this any more
- Adding
WARNING_REF_DOES_NOT_MATCH_TRANSCRIPT
toAnnotationMessage
- Properly pushing through warnings from the annotators into the returned
VariantAnnotation
object - Pedigree files are now more compatible to the PLINK format
- whitespace separated instead of tab separated (read only, written as TSV)
- interpreting any value not in {1, 2} to be "unknown" sex instead (coded as 0) of throwing
- Fixing bug in transcript-to-genome translation, in HGVS the stop codon is not part of the CDS but in
TranscriptModel
it is - Optional interpretation of certain filters in GeneWiseMendelianAnnotationProcessor.
- Extending interface of
VariantContextAnnotator
for automatic error annotation generation, previously in jannovar-cli - Adding
VariantEffectHeaderExtender
class tojannovar-htsjdk
- Fixing bug with problems of unmodifiable Attributes (error annotation).
- Also writing out variant allele origin for dbSNP
- Adding annotation with COSMIC
- Fixing header description for exac database
- Fixing output of
DBSNP_CAF
to also contain reference allele AF - Adding annotation with ClinVar, can annotate all clinvar variants
- Removing this outdated module.
Use the classes in
de.charite.compbio.jannovar.mendel
instead
- all-new module for gathering statistics on VCF files
- Change email/organisations in master pom
GenotypeCalls.getGenotypeForSample()
returns a "no-call" genotype now instead ofnull
- fix to annotation with compatible mode of inheritance (#289)
- update to htsjdk 2.8.1
- removing requirement for proper contig
contig
lines in gene-wise gene annotation - fixing NPE in the case of no
contig
lines - improving error message on samples in VCF file that are not in pedigree
- fix to annotation with compatible mode of inheritance (#289)
- better overview on CLI help message
- if ref-fasta is not set properly a nicer error message will be shown.
- Fixing bug with problems of unmodifieable Attributes.
- Including Hom/Het/Hemi counds of ExAC (#295)
- update to htsjdk 2.8.1
This is a bugfix release.
- Manual loads version from central POM file now
- Adjusting manual links to point to
javadoc.io
- Fixing integration of HGNC into the downloaded databases
- For UCSC, HGNC records are searched based on the Entrez ID. If HGNC does not know the Entrez then only the Entrez ID from UCSC is written as additional ID.
- For RefSeq, linking is done through Entrez ID. If HGNC does not know the Entrez then only the Entrez ID from RefSeq is written as additional ID.
- For ENSEMBL linking is done through the ENSEMBL gene id. If this is not known to HGNC then no additional IDs are annotated.
- Fixing problem with
UnsupportedOperationException
injannovar-htsjdk
- replace charite email of p. robinson with the new one of jax
- Renaming
tx-to-chrom
tohgvs-to-vcf
, also in Java module names. - CLI changes such that one VCF input and one VCF output path can be used only
- Replacing apache commons-cli with argparse4j for a more modern and usable CLI
- Consistently writing out HUGO symbols for gene names, using the
hgnc_complete_set.txt
information downloaded when building the annotation DB - Upgrading from ENSEMBL-74 to ENSEMBL-75 for annotation database files
- Removing support for old Jannovar-style annotations (#241)
- Adding new command for annotating csv files (annotate-csv)
- Properly annotating Mendelian inheritance for intergenic variants
- downloading
hgnc_complete_set.txt
together with data sets,TranscriptModel
objects now consistently contain additional IDs - making ENSEMBL parsing more robust (falling back to transcript name if no transcript ID)
- fixing bug #248 for ENSEMBL that used
gene_id
forgene_name
- bugfix of NullPointerException in RefSeqParser while parsing refSeq curated
- bugfix space in SeqOID of SYNONYMOUS_VARIANT
- Update link to HGVS Nomenclature
- Now BestAnnotationListTextGenerator shows really the best and not all annotations!
- Documenting cli changes
- Adding additional sites contributing, FAQ and how to filter
- Better description of installations and quickstart
- this is gone, the functionality is now available as part of jannovar-cli
- this module is done, everything here is merged into jannovar-htsjdk
- The first version ships with support for dbSNP b147, ExAC 0.3, and the UK10K COHORT data base
- Initial version of this module, the aim is precise annotation from variant databases
- Updated
default_sources.ini
for latest patches of mouse and human genomes - Using one-letter amino acid code by default
- Removed slf4j2 warning at program startup
- Checking pedigree for compatibility with VCF file if given
- Adjusting API for annotating amino acid code by default
- Checking pedigree for compatibility with genotypes on Mendelian inheritance checking
- Refurbishing
Genotype
,GenotypeList
, andGenotypeListBuilder
inde.charite.compbio.jannovar.mendel
. - Moving
ModeOfInheritance
tode.charite.compbio.jannovar.mendel
. - Creating new package
de.charite.compbio.jannovar.mendel
with code for filtering for mendelian inheritance modes. - Renaming of
ModeOfInheritance.UNINITIALIZED
toModeOfInheritance.ANY
. - Fixing handling of invalid transcripts (e.g., incomplete 3' end)
- Adding
altGeneIDs
mapping toTranscriptModel
, makes data bases backwards incompatible. - Rewrite of GFF parsers for RefSeq and ENSEMBL.
- Bumping HTSJDK to 2.5.0, requiring Java 8 from now on.
- Removal of
AnnotationCollector
, priotization of variant effects is done after collecting all effect predictions now. - Fix for intronic variants between 5' or 3' UTRs. These variants were misclassified as
FIVE_PRIME_UTR_VARIANT
orTHREE_PRIME_UTR_VARIANT
. SequenceOntology implements new terms so that we can decide between the two UTR exon and intron variants. Now we haveFIVE_PRIME_UTR_EXON_VARIANT
orFIVE_PRIME_UTR_EXON_INTRON_VARIANT
(the same forTHREE_PRIME_UTR_EXON_VARIANT
orTHREE_PRIME_UTR_EXON_INTRON_VARIANT
).
- Adding better progress display with estimate of pending time.
- Adding support for annotating values from dbSNP VCF file (currently, only b147 is supported).
- Adding simple progress reporting (from verbosity level 2).
- Using Java 8 stream interface for
VariantContext
processing. - Removing support for Jannovar output format, VCF offer all features and more.
- Updating htsjdk to 1.142
- using simple logger of slf4j
- fixing version output in command line help
- changing command line interface to use more named arguments
- removing deprecated usage of commons-cli command line parser
- renaming of some internal classes and functions, fixing Javadocs
- fixing bug in
TranscriptSequenceChangeHelper
for reverse transcript (did not reverse complement alternate allele) - fixing bug in parsing GFF3 with some transcripts (e.g. GNAT1)
- less intrusive escaping in
ANN
field - renaming of some internal classes and functions, fixing Javadocs
- Updating htsjdk to 1.142
- renaming
InvalidGenomeChange
toInvalidGenomeVariant
- renaming
VariantContextAnnotator.buildGenomeChange
to.buildGenomeVariant
- renaming of some internal classes and functions, fixing Javadocs
- extending API of ProteinChange hierarchy for HGVS generation
- renaming of some internal classes and functions, fixing Javadocs
- Updating htsjdk to 1.142
- changing command line interface to use more named arguments
- adding two new functions to InheritanceCompatibilityChecker
- resolve boolean if passes inheritance into set where passed inheritances are stored
- Updating htsjdk to 1.142
- updating manual for 0.16 and using parameters for commands!
- updating readme for parameters
- making
CompatibilityCheckerAutosomalRecessiveHomozygous
public - using jannovar-hgvs for representing the changes
- more precise HGVS annotation in some cases
- predictions are wrapped in parentheses
- Mark everything that is related to the compatibility checkers as depricated (see new jannovar-inheritance-checker)
- adding module for parsing and representing HGVS-compatible nucleic and protein changes
- Updating htsjdk to 1.138
- Replacing depricatded method
VariantContext.getChr()
withVariantContext.getContig()
- Updating htsjdk to 1.138
- Replacing depricatded method
VariantContext.getChr()
withVariantContext.getContig()
- Updating commons-cli to 1.3.1
- Bugfix detecting autosomal chromosomes
- Bugfix with handling variant files with a leading "chr" in the contig.
- Adding this new module.
- Replaces the compatibility checker oh jannobvar-core.
- Now runs with VariantContext (htsjdk) instead of Jannovar Genotypes
- Use
InheritanceCompatibilityChecker.Builder
to buildInheritanceCompatibilityChecker
. - Use the method
getCompatibleWith
of theInheritanceCompatibilityChecker
with a List ofVariantContext
. - The method will return all
VariantContext
that matches the inheritance. If no variant matches the List is empty.
- Refactoring
VariantWiseInheritanceFilter
to handle the newInheritanceCompatibilityChecker
. - Rewrite
GeneWiseInheritanceFilter
to handle the newInheritanceCompatibilityChecker
. - Updating htsjdk to 1.138
- Replacing depricatded method
VariantContext.getChr()
withVariantContext.getContig()
- Adapting program to the
GeneWiseInheritanceFilter
andVariantWiseInheritanceFilter
(see jannovar-filter) - Updating commons-cli to 1.3.1
- Changing cli option inheritance-mode to multiple args (Now you can check multiple inheritances at once)
- Improving output file generation, jannovar-cli now uses the same extension as in the input and the infix is configurable instead of being fixed to ".jv".
- Default extension is ".vcf.gz" instead of ".vcf" now.
- Fixing label for
FRAMESHIFT_VARIANT
inVariantEffect
. - Moving CompatibilityCheckerException to package
...jannovar.pedigree.compatibilitychecker
- Fixing bug in transcript coordinate projection.
- Renaming
TranscriptSequenceChangeHelper.getCDSWithChange
to.getCDSWithGenomeVariant
. - Renaming
*.getChange()
to*.getGenomeVariant()
- Renaming
VariantAnnotator.buildAnnotationList
to.buildAnnotations
,VariantContextAnnotator.buildAnnotationList
to.buildAnnotations
, andVariantContextAnnotator.buildErrorAnnotationList
toVariantContextAnnotator.buildErrorAnnotations
- VariantAnnotations does not implement
List<Annotation>
any more - Adding
VariantAnnotations.getAnnotations
- Renaming
AnnotationList
toVariantAnnotations
- changing treatment of insertions at exon/intron junctions; they are considered as intronic insertions now that affect splicing
- converting
GenomeVariant
ofAnnotationList
to always be on the forward strand after construction ofAnnotationList
- deprecating the
{,All,Best}AnnotationTextGenerator
classes
- Moving
JannovarOptions
into jannovar-cli. - Displaying online help on unknown Jannovar command.
- Fixing
NullPointerException
bug for local paths. - Switching to official HTSJDK release and version 0.128.
- Writing out annotation about Jannovar call and version into the VCF file.
- Added option
--no-3-prime-shifting
to disable shifting towards the 3' end of the transcripts. - Added option
--no-escape-ann-field
to disable escaping of theANN
INFO
field. - Variants in
ANN
field are now annotated with proper Sequence Ontology terms.
- Modified
VariantContextWriterConstructionHelper
to allow explicit disabling of index creation. - Modified
VariantContextAnnotator
for adjustment to the new Exomiser. - Switching to official HTSJDK release and version 0.128.
- Changing
VariantContextWriterConstructionHelper
to allow writing out of additional header lines. - Added option to
VariantContextAnnotator#Options
for disabling 3' shifting. - Modified
VariantContextAnnotator
allowing to disable escaping of theANN
INFO
field.
- Moving
JannovarOptions
into jannovar-cli. - Renaming
ACompatibilityChecker
andICompatibilityChecker
. - Adding
GenomePosition.differenceTo(GenomeInterval)
. - Renaming package
de.charite.compbio.jannovar.io
tode.charite.compbio.jannovar.data
- Renaming
AnnotationLocation.toHGVSString
to.toHGVSChunk
. - Adding
Pedigree.subsetOfMembers
- Renaming
GenomeChange
toGenomeVariant
, same with types having the same prefix. - Introducing
DatasourceOptions
for configuring data download. - Removing support for using
"-"
as REF or ALT value. - Making previous
public final
membersprivate final
(orprotected final
) and adding getters for read-only access to them. - Removing position type member of
CDSInterval
. - Using type
Strand
instead of'+'
and'-'
, requires database rebuild. - Adding enum
Strand
withPLUS
andMINUS
values. - Adding
VariantEffect.isOffExome
and updatingVariantEffect.isOffTranscript
. - Removing
genomeRegion
member fromGenotypeList
. Also, adjusting the pedigree compatibility checkers for this, the check for being on the X chromosome has to be performed outside the checker now. VariantList.getHighestImpactEffect
now returnsVariantEffect#SEQUENCE_VARIANT
if no annotation can be found.VariantList
implements theList<Annotation>
interface now and theentries
member has become private.- Adding
VariantEffect#SEQUENCE_VARIANT
for variants with unknown effects. GenomeChange.toString()
now always converts to forward strand.- Fixing bug in
Annotation
and enforcing forward strandGenomeChange
instances. - Updates to the manual.
JannovarData
now also stores a mapping from transcript accession toTranscriptModel
and from gene symbol toTranscriptModel
.- Adding functionality for conversion from CDS to transcript and genome position and tests.
- Adding
AnnotationBuilderOption
object that allows disabling of 3' shifting towards the transcript. - Adding
JannovarOptions#escapeAnnField
. - Renaming
VariantType
toVariantEffect
- Changing
VariantType
to use proper Sequence Ontology terms. Legacy names can be obtained throughVariantType#getLegacyName
. - Spliting
CompatibilityCheckerXRecessive
intoCompatibilityCheckerXRecessiveCompoundHet
andCompatibilityCheckerXRecessiveHomozygous
. Now all inheritance checkers ar ready to use (AR,XR,AD,XD) - move all pedigree compatibility checkers from
de.charite.compbio.jannovar.pedigree
tode.charite.compbio.jannovar.pedigree.compatibilitychecker
and divide it into ar,xr,ad,xd. - generate interface
ICompatibilityChecker
for pedigree compatibility checkers. - Combine compatibility fields and methods in an abstract class
ACompatibilityChecker
to unify methods, builders, and fields.
- Splitting into
jped-cli
andjannovar-filter
- Changing public final members to accessors.
jannovar-filter
now has the Jannovar DB as the mandatory first argument.
- Changing public final members to accessors.
- Started bridge module between Jannovar and HTSJDK.
- Started tool for mode of inheritance--based filters.
- Splitting out bridge module between jannovar-core and HTSJDK to jannovar-htsjdk.
- Adding implementation of variant annotation standard 1.0.
- Adding unit tests for jannovar-cli.
- Fixing problem with empty
INFO
fields in output. - Adding back
--output-dir
to jannovar-cli. - Writing output parallel to input file by default.
- Adding
-v
and-vv
command line options. - Fixing problems with block substitution (delins) case (#87).
- Adding initial support for the transcript support level feature of the new VCF annotation standard (only in very recent ENSEMBL releases, apparently).
TranscriptModel#geneID
is now aString
- Update in various classes, e.g. Annotation.
- Fixing bug in PED parsing (empty lines are properly skipped now).
- More tests and fixes for the inheritance compatibility checkers.
- Updating
Annotation
for the variant annotation standard. TranscriptPosition
andTranscriptInterval
use zero-based positions now.- Reordering values of
VariantType
. - Somewhat renaming
VariantType
method names. - Removing the
VariantType#size
function in favor of astatic public
final
member. - Using log4j/slf4j for I/O in jannovar-core.
- Adding
PrintStream
as parameter toJannovarOptions#print
. - Compressing serialized file.
- Changing namespace to
de.charite.compbio.jannovar
. - Making
VariantType#priorityLevel
a non-static member. - Renaming
TranscriptInfo
toTranscriptModel
. - Moving
HG19RefDictbuilder
from tests to main. - Using
ImmutableMap
inTranslator
for small performance improvements. - Using
StringBuilder
-based concatenation of strings for generation of HGVS strings etc. since this is much faster than usingString#format
. GenomePosition
andGenomeInterval
use zero-based coordinates internally now.