Gencode parser for vdftools. Current version v47 (GRCh38), produces:
- transcript_chrom (string)
- transcript_start (integer)
- transcript_end (integer)
- transcript_id (string) # ENST
- gene_id (string) # ENSG
- gene_symbol (string)
- is_coding (bool)
- is_pos_strand (bool)
- is_seleno (bool) # is selenoprotein
- priority_level (integer)
- exon_total (integer) # exon count
- is_excludable (bool) # transcripts with problems (ie. readthrough, no-start, no-end)
-
APPRIS alternative 2
-
APPRIS alternative 1
-
APPRIS principal 5
-
APPRIS principal 4
-
APPRIS principal 3
-
APPRIS principal 2
-
APPRIS principal 1
-
MANE Clinical
-
MANE Select
- exon_chrom (string)
- exon_start (integer)
- exon_end (integer)
- transcript_id (string) # ENST
- exon_number (integer)
- bases_preceding_exon (integer) # total bases in preceding exons
- cds_chrom (string)
- cds_start (integer)
- cds_end (integer)
- transcript_id (string) # ENST
- branch_chrom (string)
- branch_start (integer)
- branch_end (integer)
- transcript_id (string) # ENST
- transcript_id (string) # ENST
- orf_start_pos_in_transcript (integer) # 0-based position of A in start codon
- orf_end_pos_in_transcript (integer) # closed position end of stop codon
- transcript_sequence (string) # ACGT, coding transcripts