Skip to content
xzhou82 edited this page Jul 5, 2024 · 44 revisions

Using custom tracks in ProteinPaint

  • Tracks are defined as JSON objects.
  • Submit tracks by launching a genome browser, see below:
  • All text values are case-sensitive.


Common names such as red, green: https://en.wikipedia.org/wiki/Web_colors #FF0000 rgb(255,0,0) rgba(255,0,0,.5)

Example

Go to https://proteinpaint.stjude.org/ , launch hg19 genome browser and paste in following JSON text to add two tracks:

[
{"type":"bigwig","file":"hg19/hg19.100way.phastCons.bw","name":"UCSC phastCons 100ways","dotplotfactor":20,"height":100},
{"type":"bedj","file":"anno/refGene.hg19.gz","name":"RefSeq genes","translatecoding":1,"color":"#417D4C","stackheight":20}
]

View or debug JSON with https://jsonlint.com/

The JSON track objects can be used with embedding API.

Example: { "name":"name of the track", "type":, “url”:”http://domain/file.gz”, “indexURL”:”http://domain/path/file.gz.tbi”, "file":"path/to/file.gz", // use this when not using URL “toppad”:5, “bottompad”:5 }

"name": STR

  • A string as track name

"type": STR

  • Typecode of the track. Allowed values are:
    • bigwig
    • bigwigstranded
    • bedj
    • profilegenevalue
    • junction
    • mds3
    • bam
    • bampile
    • hicstraw

"file": STR "url": STR "indexURL": STR

  • Either “file” or “url” should be provided, but not both. When using “file”, provide the relative path to the track file starting from directory as is configured on the ProteinPaint server.
  • When using URL for tabix-indexed files, by default it requires the index file to share URL with the .gz file. When it’s not sharing the URL, the attribute “indexURL” must be used to provide the URL of the index file.

"toppad": INT

  • Number of pixels as the padding space on the top, default: 5

"bottompad": INT

  • Number of pixels as the padding space at the bottom, default: 5

"hidden": 1

  • If set, the track will be hidden by default and can be found in the track menu (by clicking the “Tracks” button)

Example: https://proteinpaint.stjude.org/?block=on&genome=hg19&bigwigfile=BigWig_Demo,proteinpaint_demo/hg19/bigwig/file.bw

{
	    "name": "track name",
    "type": "bigwig",
    "file": "proteinpaint_demo/hg19/bigwig/file.bw",
    "scale": {
	    "min": 0,
	    "max": 100
    },
    "height": 100
}

bigWig track attributes:

scale: {}

  • min:number
  • max:number
    • Set a fixed scale range of the Y axis
  • percentile:number
    • Value is integer from 1 to 99, representing a percentile of all the data in the view range. Overrides min/max
  • auto:1
    • Set automatic scale, will override all other settings in “scale” height:number
  • Bar plot height in number of pixels. If height is below 10, the track will be rendered as heatmap.

pcolor:str

  • Bar color of the positive values

pcolor2:str

  • Rendering color for data points above Y axis maximum value.

ncolor:str

  • Bar color of negative values

ncolor2:str

  • Rendering color for data points below Y axis minimum value

dotplotfactor:int

  • Value is positive integer e.g. 5 or 10. When applied, will request 5 or 10 times more data points from a bigWig track and plot each point as a dot, rather than bars. A use case is checking (large-scale) CNV from DNA sequencing coverage track

Example: https://proteinpaint.stjude.org/examples/bigwig.stranded.html

For showing stranded RNA-seq coverage data as a pair of bigWig tracks, with forward strand on top and reverse strand on bottom.

{
    "name": "stranded RNA-seq coverage",
    "type": "bigwigstranded",
     “strand1”:{
        "file": "path/to/sample.forwardstrand.bw",
        "scale": {
	        "min": 0,
	        "max": 100
        },
        "height": 50
     },
     “strand2”:{
        "file": "path/to/sample.reversestrand.bw",
        "scale": {
	        "max": 0,
	        "min": -100
        },
        "height": 50,
         "normalize": {  "dividefactor": -1 }
     }
}
  • strand1 : {}
    • The bigWig track of the data from forward strand
    • For read-coverage data, the values in the forward-sttrand bigWig file should be positive
  • strand2 : {}
    • The bigWig track of the data from reverse strand
    • Both strands follow the bigWig track definition.

Note: for stranded bigwig files using all positive values for both strands (e.g. sequencing read coverage), a “normalization value” of -1 should be applied to the reverse strand, so the bars will point down.

If the reverse strand bigwig track has been prepared to have negative values, then no need to apply the -1 normalizing factor.

Example: https://proteinpaint.stjude.org/examples/pgv.html

Read more about the PGV track

Example: https://proteinpaint.stjude.org/examples/junction.html

{
    "type": "junction",
    "name": "sample junction",
    "file": "junction/targetALL/10-PANYGB-diagnosis-SJCOGALL010859_D2.gz",
    "categories": {
	    "known": {
		    "label": "Known",
		    "color": "#9c9c9c"
	    },
	    "novel": {
		    "label": "Novel",
		    "color": "#cc0000"
	    }
    }
}

“categories” specifies the rendering color for types of junctions.

Read the splice junction file format.

Multiple junction tracks can be aggregated to show in one track, via the .tracks[ ] attribute:

{
"type":"junction",
"name":"sample junction",
"tracks":[
    {
      "sample":"sample1",
      "file":"path/to/sample1.gz"
    },  
    {   
     "sample":"sample2",
       "file":"path/to/sample2.gz"
    },  
    … more samples …
],
"categories":{ … } 
}

In the .tracks[ ], add one object for each member track.

When combining multiple junction tracks, each track can contain one or multiple samples.

Example: https://proteinpaint.stjude.org/?genome=hg38&gene=kras&mds3bcffile=BCF_Demo,hg38/clinvar.hg38.bcf.gz

Bampile track can be used to examine the very rare alleles from a high-depth (capture-based) DNA sequencing experiment. Example of bampile track:

{
  "name": "track name",
  "type": "bampile",
  "file": "path/to/file.gz",
  "url": "use url if the file is hosted on a web server"
}

Read the bampile file format.

  • fineheight
    • Height of the bar plot at bottom showing low-frequency alleles, the Y scale uses a cutoff value as defined by “fineymax”
  • allheight
    • Height of the bar plot at top showing frequency of all alleles with automatic scale for coverage
  • midpad
    • Padding distance between top and bottom bar plots
  • fineymax
    • Y scale used by the bottom bar plot
  • usegrade
    • Name of the grade to use

Example: https://proteinpaint.stjude.org/examples/aicheck.html

Read the aicheck track tutorial.

Example: https://proteinpaint.stjude.org/examples/svcnv.html

This launches a custom GenomePaint track. To access official tracks, see embedding API.

{
  "name": "track name",
  "type": "mdssvcnv",
  "file": "path/to/svcnv.gz",
  "checkexpressionrank":{
     "file":"hg38/tcga-gdc/SKCM/TCGA_SKCM.fpkm.gz"
  }
  "checkvcf":{
      "file":"hg38/tcga-gdc/SKCM/TCGA_SKCM.vcf.gz"
  }
}

Read the GenomePaint tutorial.

You can replace “file” with “url” in above 3 places.

Following attributes can be applied in the track object as detailed in the Embedding API.

  • singlesample:{}
  • isfull:true / isdense:true
  • sampleAttribute:{}
  • vcf:{}
  • hide_cnvgain:true
  • hide_cnvloss:true
  • cnv:{}
  • sampleset:[]

To supply sample assay tracks for a custom GenomePaint track, use the “sample2assaytrack” attribute. The value is an object of key-value pairs, where keys are sample names, and values are lists of assay tracks available for that sample.

A derivative of the “mdssvcnv” track is the multi-sample ASE track.

Example: https://proteinpaint.stjude.org/examples/ase.html

{ 
    type:'mdssvcnv',
    name:'Multi-sample ASE analysis',
    checkvcf:{
         file:'hg19/TARGET/DNA/test/oct3/sorted.vcf.gz',
    },
    checkrnabam:{
         samples:{
              SJALL015260_D1: {
                   file:'hg19/TARGET/RNAbam/SJALL015260_D1.bam',
                   totalreads: 83388794,
              },
         SJALL015643_D1: {
              file:'hg19/TARGET/RNAbam/SJALL015643_D1.bam',
              totalreads: 103477133,
         },
         … more BAM files … 
    },
}

Note this track type is deprecated. Use mds3 instead. The track object contains one single attribute:

{
  "mdsjsonfile": "path/to/your/dataset.json"
}

Example: https://proteinpaint.stjude.org/examples/ase.single.html

This carries on-the-fly ASE analysis for a single sample. This track can be built standalone, or spawned from a multi-sample track (a special mode of mdssvcnv track)

{
    type:'ase',
    name:'My sample ASE',
    samplename:'my_sample_name',
    rnabamfile:'path/to/sample.rnaseq.bam',
    rnabamtotalreads: 103477133,
    vcffile:'path/to/SJALL015643_D1.gz',
},

Defines the Interaction track.

{
	    name:"NALM6 in-situ Hi-C",
    type:"hicstraw",
    file:"files/hg19/nalm6/hic_Nalm6.inter.hic",
    percentile_max:95,
    mincutoff:0,
    pyramidup:1,
    enzyme:"MboI"
}

File format

A track file requires at least 3 columns, separated by tab:

  • Chromosome
  • Start position (0-based)
  • Stop position (not including the ending base)
  • Optional stringified JSON object

Each line is a genomic feature. Using a file with only the first 3 columns will produce a basic rendering of genomic segments.

https://proteinpaint.stjude.org/?genome=hg19&block=1&bedjfile=test,proteinpaint_demo/hg19/misc/rmsk.bed3.gz

Using JSON objects at the 4th column allows to richly describe the genomic features, which offers better flexibility compared to fixed columns of BED format and its variants. See section 3 about JSON object specification.

Prepare and use a track

$ sort -k1,1 -k2,2n [file] > [file.sorted]
$ mv [file.sorted] [file]
$ bgzip [file]
$ tabix -p bed [file].gz

To host the track on a web server: put both .gz and .gz.tbi files at the same directory on the web server. Obtain the URL to the .gz file and submit it to the browser.

To host the track on ProteinPaint server: put both .gz and .gz.tbi files at the same directory under the directory. Obtain the relative path to the .gz file and submit it to the browser.

Alternatively, a bigbed file could be used as source file and will be parsed into JSON-BED format: To host the track on a web server: put the bigbed file at the same directory on the web server. Obtain the URL to the bigbed file and submit it to the browser.

To host the track on ProteinPaint server: put bigbed file at the same directory under the directory. Obtain the relative path to the bigbed file and submit it to the browser.

JSON object for a BED item

The content is a string representation of an object (key-value pairs). Examples:

{"name":"CTCF1","strand":"+"}

{"name":"MIR6859-1","isoform":"NR_106918","strand":"-","exon":[[17368,17436]],"rnalen":68}

Format requirements:

  • No line breaks in the JSON text
  • Must include braces {}
  • Use double quotes for strings
  • Don’t use quotes for numerical values
  • Keys are case-sensitive
  • These keys cannot be used in JSON and will be ignored: chr, start, stop, canvas

Supported JSON keys:

"name": STR

  • Value is a string. For genes, “name” is gene symbol. When “itemurl_appendname” is specified, “name” is required to enable clicking on an item from track display and trigger a URL.

"isoform": STR

  • Gene isoform accession, e.g. NM_000546 or ENST00000269305. Both name and isoform can appear in the tooltip when hovering cursor over the track display.

"strand": STR

  • “+” or “-”. Unstranded if not provided.

"exon":[]

  • Array of two-number arrays, e.g. 665562,665731],[665277,665335],[661138,665184. Must be present for genes. All positions are 0-based. The stop position of an exon is the nucleotide next to the last exonic nucleotide, similar to the UCSC BED format. Notes:
    • For coding genes: Value should be a union of UTRs and CDS.
    • For noncoding genes: Value should be all exons.
    • Exons in this array are ordered from 5’ to 3’.
    • Despite the presence of “utr5”, “utr3”, “coding”, “intron” attributes, the “exon” attribute is still required.

"intron":[]

  • Same format as “exon[]”. Required for native gene tracks. The stop position of an intron is the first nucleotide in the exon.

"utr5":[]

  • Same format as “exon[]”. Required for any 5’ UTRs in coding genes.

"utr3":[]

  • Same format as “exon[]”. Required for any 3’ UTRs in coding genes.

"rnalen": INT

  • Base-pair length of RNA transcript. Required for all genes.

"cdslen": INT

  • Base-pair length of coding region length of an mRNA transcript. Required for coding genes. Will include nucleotides from incomplete codon as defined by startCodonFrame. Do not use for noncoding genes.

"codingstart": INT

  • Genomic position of the smaller boundary of the coding sequence. Required for coding gene. Do not use for noncoding gene.

"codingstop": INT

  • Genomic position of the bigger boundary of the coding sequence. Required for coding gene.

"startCodonFrame": 1/2

  • Tells how many nucleotides the “start codon” of this transcript should be shifted for translation. In the case of IGKC, startCodonFrame=1 means it will borrow 1 nt from the previous IGKJ exons. So the first two nucleotides of IGKC will not be translated when looking at IGKJ alone.

"coding":[ [start,stop], … ]

  • Same as “exon[]”, for coding exons only. Set “translatecoding” to true in the track object to enable gene translation according to the coding exons as well as coding frame defined by the “coding:[]” attribute. The translation can happen when the browser is at sufficient resolution. The first element will include nucleotides from incomplete codon as defined by startCodonFrame.

"description": STR

  • Some text, e.g. gene function.

"color": STR

"exon2color": [ { start, stop, color }, … ]

  • Optional. Each element: { start, stop, color } Start and stop are 0-based. This will override the item color for the matching exon from “exon[]” array.

"category": STR

  • Value is string or integer. Value must be a key of the “categories” attribute of the track object.

"isoformonly": STR

  • Experimental fix to filter bed items by isoform, so that certain bed items will be shown under a specific isoform. Value is an isoform accession.

Declaring a track as a JSON object

{
"type":"bedj",
"name":"gene track”,
"file":"anno/gencode.v24.hg19.gz",
"stackheight":14,
"stackspace":1
}

stackheight:20

  • Height of rows in number of pixels. All rows share the same height.
  • For gene tracks, this height will be the thickness of coding exons, while UTRs and noncoding exons will have height reduced by 4 pixels.

stackspace:1

  • Spacing distance between rows.

color:”blue”

  • Track rendering color for lines, boxes, and text labels. Per-item color defined in the track file will override this setting.

onerow:true

  • Value is “1” for true. Forces all items in the view range to be displayed in the same row, and item names will be hidden. Useful for making compact representation of certain tracks, e.g. chromHMM. Delete this attribute to cancel the effect.

categories:{}

  • List of categories, each item in the track will belong to one category and will be colored accordingly. E.g. {“1”:{“color”:”red”,”label”:”type 1”}, “2”:{“color”:”blue”,”label”:”type 2”}, … }

translatecoding:1

  • Will translate genes when the resolution is fine enough. This requires the .coding[] attribute in the JSON objects of BED items.

itemurl_appendname: URL

  • Allows clicking on an item from the track and open up a URL customized by the name of that item; item’s name will be appended to the end of the URL as the value of a parameter.
  • Example: given URL of “http://google.com?query=”. When clicked on an item named “HOX”, this URL will be triggered: http://google.com?query=HOX

hideItemNames: true

  • Do not show item names in the track

filterByName: “Item1\nItem2”

  • Multiple item names joined by line break. Will only show given items in the track.