diff --git a/docs/setting_up_a_node.md b/docs/setting_up_a_node.md new file mode 100644 index 00000000..2124d105 --- /dev/null +++ b/docs/setting_up_a_node.md @@ -0,0 +1,61 @@ +# Setting up an EpiVar node + + +## Dependencies + +* Docker +* Docker Compose plugin +* A configurable reverse proxy such as NGINX, Trafik, or similar + + +## Data and configuration requirements + +### Raw data (stored on the node, not revealed publicly) + +- [ ] A VCF containing sample variants, using one of two available reference genomes (`hg19`/`hg38`) +- [ ] A set of normalized signal matrices: one per assay, each containing columns of samples and rows of features + (see an [example for ATAC-seq](/input-files/matrices/ATAC-seq.example.tsv)) +- [ ] A set of bigWigs, one or two (forward/reverse view) per sample-assay pair +- [ ] Peak and gene-peak-link CSV files: + - TODO: PEAK DATA + +### Dataset metadata + +- [ ] A metadata file for the bigWig tracks, which can be one of the following: + - An XLSX file with one or more sheets + (see [an example for the Aracena *et al.* dataset](/input-files/flu-infection.xlsx)), each with the following + headers: + - `file.path`: relative path to `bigWig`, without the `EPIVAR_TRACKS_DIR` environment variable directory prefix + - `ethnicity`: ethnicity / population group **ID** (*not* name!) + - if set to `Exclude sample`, sample will be skipped + - `condition`: condition / experimental group **ID** (*not* name!) + - `sample_name`: Full sample name, uniquely indentifying the sample within + `assay`, `condition`, `donor`, and `track.view` variables + - `donor`: donor ID (i.e., individual ID) + - `track.view`: literal value, one of `signal_forward` or `signal_reverse` + - `track.track_type`: must be the literal value `bigWig` + - `assay.name`: one of `RNA-Seq`, `ATAC-Seq`, `H3K27ac`, `H3K4me1`, `H3K27me3`, `H3K4me3` + + The file may have additional headers, but these will be discarded internally. + - **OR**, a JSON file containing a list of objects with the following keys, mapping to the above headers in order: + - `path` + - `ethnicity` + - `condition` + - `sample_name` + - `donor` + - `view` + - `type` + - `assay` +- [ ] A dataset configuration file, which takes the form described in the + [example configuration file](/config.example.js). +- [ ] A human-readable dataset description file, to show in the `About Dataset` tab in the portal. TODO + + +## Deploying + +TODO + + +## Joining the EpiVar Portal federation + +TODO diff --git a/readme.md b/readme.md index fdd32df3..231ca3d0 100644 --- a/readme.md +++ b/readme.md @@ -1,6 +1,6 @@ # The EpiVar Browser -A web application to search for eQTL/epigenetic signal-associated variants and +A federated web application to search for eQTL/epigenetic signal-associated variants and merge bigWig tracks by genotype. A production instance with data from @@ -41,9 +41,9 @@ The tool itself is described in a 2023 preprint: ## Installation -### Note on hosting your own instance +### Note on hosting your own node -With some effort, the EpiVar browser can be deployed with other data than just +The EpiVar Browser's server component can be deployed with other data than just the Aracena *et al.* dataset. The instructions below must be followed, paying especially close attention to the formats described in the [Application data](#application-data) section. @@ -145,7 +145,7 @@ The different data sources to generate/prepare are: in the box plots generated by the server. - **Metadata:** This is the track's metadata. This can either be provided as an - XLSX file with the headers: + XLSX file with the following headers: - `file.path`: relative path to `bigWig`, without `config.paths.tracks` directory prefix - `ethnicity`: ethnicity / population group **ID** (*not* name!) - if set to `Exclude sample`, sample will be skipped @@ -155,7 +155,6 @@ The different data sources to generate/prepare are: - `donor`: donor ID (i.e., individual ID) - `track.view`: literal value, one of `signal_forward` or `signal_reverse` - `track.track_type`: literal value `bigWig` - - `assembly.name`: assembly name (e.g., `hg19`). - `assay.name`: one of `RNA-Seq`, `ATAC-Seq`, `H3K27ac`, `H3K4me1`, `H3K27me3`, `H3K4me3` and the sheets (which match `assay.name`): @@ -175,7 +174,6 @@ The different data sources to generate/prepare are: - `donor` - `view` - `type` - - `assembly` - `assay` Information on the track metadata file: