Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add generic tsv format for bgzip:ed, tabix indexed data #1543

Open
kpalin opened this issue Sep 8, 2022 · 2 comments
Open

Add generic tsv format for bgzip:ed, tabix indexed data #1543

kpalin opened this issue Sep 8, 2022 · 2 comments

Comments

@kpalin
Copy link
Contributor

kpalin commented Sep 8, 2022

Asking for ability to plot numerical data from an arbitrary column of coordinate sorted, bgzip compressed and tabix indexed data file.

This kind of data is quite common output from various genome wide analysis programs, e.g. differential methylation analysis on individual CpG sites, which have data only on 1/16:th of genome with irregular intervals and has multiple putatively interesting numbers, such as p-value, difference between the two conditions, methylation level on cases and methylation level on controls. This data format is often post-processed in further analysis, so keeping all data in single rows is preferable.

From implementation perspective documentation and alias for gwas format would work for most situations. The syntax for gwas format definition is revealed here. Improved features would be (1) optionally distinct start and end coordinates (currently it's only a single pos) (2) ability to name the columns for the popup display and (3) getting the header defined by the tabix index (as given by tabix -H command/option)

Suggestion for track definition syntax would be

            {
                type: "wig",
                format: "tsv",
                name: "Generic TSV sample",
                url: "example.tsv.gz",
                indexURL: "example.tsv.gz.tbi",
                columns: {
                    chromosome: 1,
                    start: 2,
                    end: 3,
                    value: 5,
                    names: [
                      "chrom",
                      "start",
                      "end",
                      "pvalue",
                      "effect"
                    ] 
                }
            }

Previously discussed in the discussion about pull request #1540

@jrobinso
Copy link
Contributor

jrobinso commented Nov 3, 2022

@kpalin I don't understand what is to be done with the file header, if any ("-H"). What information in there would be useful? Are you thinking of a track line? Feel free to zip and provide an example.

@kpalin
Copy link
Contributor Author

kpalin commented Nov 3, 2022

Allowing for a track line would be a good idea also but I was thinking about the column names treated the same as names array above. Here's sample_data.zip.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants