Skip to content

Commit

Permalink
docs: start work on new docs for dockerized/federated epivar
Browse files Browse the repository at this point in the history
  • Loading branch information
davidlougheed committed Jan 8, 2024
1 parent 2a62be5 commit 6c115c2
Show file tree
Hide file tree
Showing 2 changed files with 65 additions and 6 deletions.
61 changes: 61 additions & 0 deletions docs/setting_up_a_node.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# Setting up an EpiVar node


## Dependencies

* Docker
* Docker Compose plugin
* A configurable reverse proxy such as NGINX, Trafik, or similar


## Data and configuration requirements

### Raw data (stored on the node, not revealed publicly)

- [ ] A VCF containing sample variants, using one of two available reference genomes (`hg19`/`hg38`)
- [ ] A set of normalized signal matrices: one per assay, each containing columns of samples and rows of features
(see an [example for ATAC-seq](/input-files/matrices/ATAC-seq.example.tsv))
- [ ] A set of bigWigs, one or two (forward/reverse view) per sample-assay pair
- [ ] Peak and gene-peak-link CSV files:
- TODO: PEAK DATA

### Dataset metadata

- [ ] A metadata file for the bigWig tracks, which can be one of the following:
- An XLSX file with one or more sheets
(see [an example for the Aracena *et al.* dataset](/input-files/flu-infection.xlsx)), each with the following
headers:
- `file.path`: relative path to `bigWig`, without the `EPIVAR_TRACKS_DIR` environment variable directory prefix
- `ethnicity`: ethnicity / population group **ID** (*not* name!)
- if set to `Exclude sample`, sample will be skipped
- `condition`: condition / experimental group **ID** (*not* name!)
- `sample_name`: Full sample name, uniquely indentifying the sample within
`assay`, `condition`, `donor`, and `track.view` variables
- `donor`: donor ID (i.e., individual ID)
- `track.view`: literal value, one of `signal_forward` or `signal_reverse`
- `track.track_type`: must be the literal value `bigWig`
- `assay.name`: one of `RNA-Seq`, `ATAC-Seq`, `H3K27ac`, `H3K4me1`, `H3K27me3`, `H3K4me3`

The file may have additional headers, but these will be discarded internally.
- **OR**, a JSON file containing a list of objects with the following keys, mapping to the above headers in order:
- `path`
- `ethnicity`
- `condition`
- `sample_name`
- `donor`
- `view`
- `type`
- `assay`
- [ ] A dataset configuration file, which takes the form described in the
[example configuration file](/config.example.js).
- [ ] A human-readable dataset description file, to show in the `About Dataset` tab in the portal. TODO


## Deploying

TODO


## Joining the EpiVar Portal federation

TODO
10 changes: 4 additions & 6 deletions readme.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# The EpiVar Browser

A web application to search for eQTL/epigenetic signal-associated variants and
A federated web application to search for eQTL/epigenetic signal-associated variants and
merge bigWig tracks by genotype.

A production instance with data from
Expand Down Expand Up @@ -41,9 +41,9 @@ The tool itself is described in a 2023 preprint:

## Installation

### Note on hosting your own instance
### Note on hosting your own node

With some effort, the EpiVar browser can be deployed with other data than just
The EpiVar Browser's server component can be deployed with other data than just
the Aracena *et al.* dataset. The instructions below must be followed,
paying especially close attention to the formats described in the
[Application data](#application-data) section.
Expand Down Expand Up @@ -145,7 +145,7 @@ The different data sources to generate/prepare are:
in the box plots generated by the server.

- **Metadata:** This is the track's metadata. This can either be provided as an
XLSX file with the headers:
XLSX file with the following headers:
- `file.path`: relative path to `bigWig`, without `config.paths.tracks` directory prefix
- `ethnicity`: ethnicity / population group **ID** (*not* name!)
- if set to `Exclude sample`, sample will be skipped
Expand All @@ -155,7 +155,6 @@ The different data sources to generate/prepare are:
- `donor`: donor ID (i.e., individual ID)
- `track.view`: literal value, one of `signal_forward` or `signal_reverse`
- `track.track_type`: literal value `bigWig`
- `assembly.name`: assembly name (e.g., `hg19`).
- `assay.name`: one of `RNA-Seq`, `ATAC-Seq`, `H3K27ac`, `H3K4me1`, `H3K27me3`, `H3K4me3`
and the sheets (which match `assay.name`):
Expand All @@ -175,7 +174,6 @@ The different data sources to generate/prepare are:
- `donor`
- `view`
- `type`
- `assembly`
- `assay`
Information on the track metadata file:
Expand Down

0 comments on commit 6c115c2

Please sign in to comment.