Merge pull request #69 from ENCODE-DCC/v1.1.8

V1.1.8
ENCODE-DCC · May 24, 2019 · 22d5b48 · 22d5b48
2 parents 2f567e6 + 1a50594
commit 22d5b48
Show file tree

Hide file tree

Showing 97 changed files with 1,822 additions and 616 deletions.
diff --git a/.circleci/config.yml b/.circleci/config.yml
@@ -51,7 +51,7 @@ jobs:
           name: build image
           command: |
             source ${BASH_ENV}
-            export DOCKER_CACHE_TAG=v1.1.6
+            export DOCKER_CACHE_TAG=test-v1.1.8
             echo "pulling ${DOCKER_CACHE_TAG}!"
             docker pull quay.io/encode-dcc/chip-seq-pipeline:${DOCKER_CACHE_TAG}
             docker login -u=${QUAY_ROBOT_USER} -p=${QUAY_ROBOT_USER_TOKEN} quay.io

diff --git a/README.md b/README.md
@@ -7,41 +7,67 @@ This ChIP-Seq pipeline is based off the ENCODE (phase-3) transcription factor an
 
 ### Features
 
-* **Flexibility**: Support for `docker`, `singularity` and `Conda`.
 * **Portability**: Support for many cloud platforms (Google/DNAnexus) and cluster engines (SLURM/SGE/PBS).
-* **Resumability**: [Resume](utils/qc_jsons_to_tsv/README.md) a failed workflow from where it left off.
 * **User-friendly HTML report**: tabulated quality metrics including alignment/peak statistics and FRiP along with many useful plots (IDR/cross-correlation measures).
   - Examples: [HTML](https://storage.googleapis.com/encode-pipeline-test-samples/encode-chip-seq-pipeline/ENCSR000DYI/example_output/qc.html), [JSON](docs/example_output/v1.1.5/qc.json)
 * **Genomes**: Pre-built database for GRCh38, hg19, mm10, mm9 and additional support for custom genomes.
 
-## Installation and tutorial
+## Installation
 
-This pipeline supports many cloud platforms and cluster engines. It also supports `docker`, `singularity` and `Conda` to resolve complicated software dependencies for the pipeline. A tutorial-based instruction for each platform will be helpful to understand how to run pipelines. There are special instructions for two major Stanford HPC servers (SCG4 and Sherlock).
+1) Install [Caper](https://github.com/ENCODE-DCC/caper#installation). Caper is a python wrapper for [Cromwell](https://github.com/broadinstitute/cromwell). Make sure that you have python3(> 3.4.1) installed on your system.
 
-* Cloud platforms
-  * Web interface
-    * [DNAnexus Platform](docs/tutorial_dx_web.md)
-  * CLI (command line interface)
-    * [Google Cloud Platform](docs/tutorial_google.md)
-    * [DNAnexus Platform](docs/tutorial_dx_cli.md)
-* Stanford HPC servers (CLI)
-  * [Stanford SCG4](docs/tutorial_scg.md)
-  * [Stanford Sherlock 2.0](docs/tutorial_sherlock.md)
-* Cluster engines (CLI)
-  * [SLURM](docs/tutorial_slurm.md)
-  * [Sun GridEngine (SGE/PBS)](docs/tutorial_sge.md)
-* Local computers (CLI)
-  * [Local system with `singularity`](docs/tutorial_local_singularity.md)
-  * [Local system with `docker`](docs/tutorial_local_docker.md)
-  * [Local system with `Conda`](docs/tutorial_local_conda.md)
+  ```bash
+  $ pip install caper
+  ```
 
-## Input JSON file
+2) Read through [Caper's README](https://github.com/ENCODE-DCC/caper) carefully.
 
-[Input JSON file specification](docs/input.md)
+3) Run a pipeline with Caper.
 
-## Output directories
+## Conda
 
-[Output directory specification](docs/output.md)
+We don't recommend Conda for dependency helper. Use Docker or Singularity instead. We will not take any issues about Conda. You can install Singularity locally without super-user privilege and use it for our pipeline with Caper (with `--use-singularity`).
+
+1) Install [Conda](https://docs.conda.io/en/latest/miniconda.html).
+
+2) Install Conda environment for pipeline.
+
+  ```bash
+  $ conda/install_dependencies.sh
+  ```
+
+## Tutorial
+
+Make sure that you have configured Caper correctly.
+
+```bash
+$ caper run chip.wdl -i examples/caper/ENCSR936XTK_subsampled_chr19_only.json --deepcopy --use-singularity
+```
+
+If you use Conda or Docker (on cloud platforms) then remove `--use-singularity` from the command line and activate it before running a pipeline.
+```bash
+$ conda activate encode-chip-seq-pipeline
+```
+
+## How to organize outputs
+
+Install [Croo](https://github.com/ENCODE-DCC/croo#installation). Make sure that you have python3(> 3.4.1) installed on your system.
+
+```bash
+$ pip install croo
+```
+
+Find a `metadata.json` on Caper's output directory.
+
+```bash
+$ croo [METADATA_JSON_FILE]
+```
+
+## How to build/download genome database
+
+You need to specify a genome data TSV file in your input JSON. Such TSV can be generated/downloaded with actual genome database files.
+
+Use genome database [downloader](genome/download_genome_data.sh) or [builder](docs/build_genome_database.md) for your own genome.
 
 ## Useful tools
 
@@ -51,10 +77,6 @@ There are some useful tools to post-process outputs of the pipeline.
 
 [This tool](utils/qc_jsons_to_tsv/README.md) recursively finds and parses all `qc.json` (pipeline's [final output](docs/example_output/v1.1.5/qc.json)) found from a specified root directory. It generates a TSV file that has all quality metrics tabulated in rows for each experiment and replicate. This tool also estimates overall quality of a sample by [a criteria definition JSON file](utils/qc_jsons_to_tsv/criteria.default.json) which can be a good guideline for QC'ing experiments.
 
-### resumer
-
-[This tool](utils/resumer/README.md) parses a metadata JSON file from a previous failed workflow and generates a new input JSON file to start a pipeline from where it left off.
-
 ### ENCODE downloader
 
 [This tool](https://github.com/kundajelab/ENCODE_downloader) downloads any type (FASTQ, BAM, PEAK, ...) of data from the ENCODE portal. It also generates a metadata JSON file per experiment which will be very useful to make an input JSON file for the pipeline.