docs: start work on new docs for dockerized/federated epivar

c3g · Jan 8, 2024 · 6c115c2 · 6c115c2
1 parent 2a62be5
commit 6c115c2
Show file tree

Hide file tree

Showing 2 changed files with 65 additions and 6 deletions.
diff --git a/docs/setting_up_a_node.md b/docs/setting_up_a_node.md
@@ -0,0 +1,61 @@
+# Setting up an EpiVar node
+
+
+## Dependencies
+
+* Docker
+* Docker Compose plugin
+* A configurable reverse proxy such as NGINX, Trafik, or similar
+
+
+## Data and configuration requirements
+
+### Raw data (stored on the node, not revealed publicly)
+
+- [ ] A VCF containing sample variants, using one of two available reference genomes (`hg19`/`hg38`)
+- [ ] A set of normalized signal matrices: one per assay, each containing columns of samples and rows of features
+      (see an [example for ATAC-seq](/input-files/matrices/ATAC-seq.example.tsv))
+- [ ] A set of bigWigs, one or two (forward/reverse view) per sample-assay pair
+- [ ] Peak and gene-peak-link CSV files:
+  - TODO: PEAK DATA
+
+### Dataset metadata
+
+- [ ] A metadata file for the bigWig tracks, which can be one of the following:
+  - An XLSX file with one or more sheets 
+    (see [an example for the Aracena *et al.* dataset](/input-files/flu-infection.xlsx)), each with the following 
+    headers:
+     - `file.path`: relative path to `bigWig`, without the `EPIVAR_TRACKS_DIR` environment variable directory prefix
+     - `ethnicity`: ethnicity / population group **ID** (*not* name!)
+           - if set to `Exclude sample`, sample will be skipped
+     - `condition`: condition / experimental group **ID** (*not* name!)
+     - `sample_name`: Full sample name, uniquely indentifying the sample within
+       `assay`, `condition`, `donor`, and `track.view` variables
+     - `donor`: donor ID (i.e., individual ID)
+     - `track.view`: literal value, one of `signal_forward` or `signal_reverse`
+     - `track.track_type`: must be the literal value `bigWig`
+     - `assay.name`: one of `RNA-Seq`, `ATAC-Seq`, `H3K27ac`, `H3K4me1`, `H3K27me3`, `H3K4me3`
+
+    The file may have additional headers, but these will be discarded internally.
+  - **OR**, a JSON file containing a list of objects with the following keys, mapping to the above headers in order:
+     - `path`
+     - `ethnicity`
+     - `condition`
+     - `sample_name`
+     - `donor`
+     - `view`
+     - `type`
+     - `assay`
+- [ ] A dataset configuration file, which takes the form described in the 
+  [example configuration file](/config.example.js).
+- [ ] A human-readable dataset description file, to show in the `About Dataset` tab in the portal. TODO
+
+
+## Deploying
+
+TODO
+
+
+## Joining the EpiVar Portal federation
+
+TODO
diff --git a/readme.md b/readme.md
@@ -1,6 +1,6 @@
 # The EpiVar Browser
 
-A web application to search for eQTL/epigenetic signal-associated variants and 
+A federated web application to search for eQTL/epigenetic signal-associated variants and 
 merge bigWig tracks by genotype. 
 
 A production instance with data from 
@@ -41,9 +41,9 @@ The tool itself is described in a 2023 preprint:
 
 ## Installation
 
-### Note on hosting your own instance
+### Note on hosting your own node
 
-With some effort, the EpiVar browser can be deployed with other data than just
+The EpiVar Browser's server component can be deployed with other data than just
 the Aracena *et al.* dataset. The instructions below must be followed,
 paying especially close attention to the formats described in the 
 [Application data](#application-data) section.
@@ -145,7 +145,7 @@ The different data sources to generate/prepare are:
       in the box plots generated by the server.
 
  - **Metadata:** This is the track's metadata. This can either be provided as an
-   XLSX file with the headers:
+   XLSX file with the following headers:
      - `file.path`: relative path to `bigWig`, without `config.paths.tracks` directory prefix
      - `ethnicity`: ethnicity / population group **ID** (*not* name!) 
        - if set to `Exclude sample`, sample will be skipped
@@ -155,7 +155,6 @@ The different data sources to generate/prepare are:
      - `donor`: donor ID (i.e., individual ID)
      - `track.view`: literal value, one of `signal_forward` or `signal_reverse`
      - `track.track_type`: literal value `bigWig`
-     - `assembly.name`: assembly name (e.g., `hg19`).
      - `assay.name`: one of `RNA-Seq`, `ATAC-Seq`, `H3K27ac`, `H3K4me1`, `H3K27me3`, `H3K4me3`
    
    and the sheets (which match `assay.name`):
@@ -175,7 +174,6 @@ The different data sources to generate/prepare are:
      - `donor`
      - `view`
      - `type`
-     - `assembly`
      - `assay`
  
    Information on the track metadata file: