Skip to content

Commit

Permalink
Merge pull request #15 from ENCODE-DCC/dev_workshop
Browse files Browse the repository at this point in the history
Dev workshop
  • Loading branch information
leepc12 authored Jul 4, 2019
2 parents ace9777 + 5bcc76a commit 90da36d
Show file tree
Hide file tree
Showing 4 changed files with 147 additions and 3 deletions.
8 changes: 6 additions & 2 deletions caper/caper.py
Original file line number Diff line number Diff line change
Expand Up @@ -387,7 +387,11 @@ def metadata(self, no_print=False):
[(Caper.KEY_CAPER_STR_LABEL, v)
for v in self._wf_id_or_label])
if not no_print:
print(json.dumps(m, indent=4))
if len(m) == 1:
m_ = m[0]
else:
m_ = m
print(json.dumps(m_, indent=4))
return m

def list(self):
Expand All @@ -398,7 +402,7 @@ def list(self):
workflow_ids = ['*']
labels = [(Caper.KEY_CAPER_STR_LABEL, '*')]
else:
workflow_ids = self._wf_id_or_label,
workflow_ids = self._wf_id_or_label
labels = [(Caper.KEY_CAPER_STR_LABEL, v)
for v in self._wf_id_or_label]

Expand Down
2 changes: 1 addition & 1 deletion caper/caper_args.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
from distutils.util import strtobool


__version__ = '0.3.9'
__version__ = '0.3.10'

DEFAULT_JAVA_HEAP_SERVER = '5G'
DEFAULT_JAVA_HEAP_RUN = '1G'
Expand Down
136 changes: 136 additions & 0 deletions docs/conf_encode_workshop_2019.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
# Welcome to ENCODE user's meeting 2019 at Seattle

## Do this before the workshop

1. Open a web browser and go to [Google Cloud Platform console](https://console.cloud.google.com/compute/instances?project=encode-workshop&instancessize=50). Click on the `SSH` button on the `workshop-server` instance.

2. Soft-link a shared configuration file.
```bash
$ mkdir -p ~/.caper && cd ~/.caper
$ ln -s /opt/code/default.conf default.conf
```

3. Authenticate yourself to get access to buckets.
```bash
$ gcloud auth login --no-launch-browser
$ gcloud auth application-default login --no-launch-browser
```

## To do together during workshop

> **WARNING**: **USERS SHOULD NOT FOLLOW THE BELOW STEPS BEFORE THE WORKSHOP**.
4. Submit a workflow to Caper server. `--deepcopy` is optional for remote data files in your input JSON. If all files defined in your input JSON is on a Google Cloud bucket then skip it.
```bash
$ caper submit /opt/code/demo-pipeline/demo.wdl -i [INPUT_JSON] --deepcopy
# you will see the following message. make sure to remember the workflow_id
# in this example, the workflow_id is f7094621-3d38-48a6-b877-1da2b0cec931
[Caper] submit: {'id': 'f7094621-3d38-48a6-b877-1da2b0cec931', 'status': 'Submitted'}
```

5. Make sure to remember `workflow_id` of your submitted workflow. You can monitor workflows with:
```bash
$ caper list [WORKFLOW_ID]
```

6. Once your workflow is done (marked as `Succeeded`). Retrieve a `metadata.json` with the following command:
```bash
$ caper metadata [WORKFLOW_ID] > metadata.json
```

7. Run Croo with the retrieved `metadata.json` to organized outputs on `--out-dir`.
```bash
$ croo metadata.json --out-dir gs://encode-workshop-outputs/croo/$USER/[PIPELINE_NAME]
```

8. Open a web browser and go to [Google Cloud Storage console](https://console.cloud.google.com/storage/browser/encode-workshop-outputs/croo/?project=encode-workshop&folder=true&organizationId=true). Navigate to your organized output directory. For example, `gs://encode-workshop-outputs/croo/[YOUR_USER_NAME]/[PIPELINE_NAME]`. Click on an HTML file then you will see a nice file table summarizing all outputs with description. Find any bigwig file in it and take a URL of it. Visualize it with your preferred genome browser.


## Setting up a Caper server instance (ADMIN ONLY)

This example is to set up a server instance for the ENCODE workshop 2019 at Seattle. However, this example should also be helpful to set up your own server instance.

> **WARNING**: This section is for admins only. **USERS SHOULD NOT FOLLOW THE BELOW STEPS ON THE INSTANCE**.
1. Create an instance with Debian-based Linux (e.g. Ubuntu). Minimum requirements for the server is CPU >=4, Memorsy > 16GB.

2. Install softwares. Install Caper (Cromwell wrapper) and Croo (Cromwell output organizer).
```bash
$ sudo apt-get update && sudo apt-get install -y default-jdk acl python3 python3-pip git wget curl htop
$ sudo pip3 install caper croo
```

3. Clone pipeline codes and share them with users. This example will install ENCODE RNA-Seq/Demo pipelines on `/opt/code`.
```bash
$ sudo mkdir /opt/code
$ sudo chown $USER:$USER /opt/code
$ cd /opt/code
$ git clone https://github.com/ENCODE-DCC/rna-seq-pipeline
$ git clone https://github.com/ENCODE-DCC/demo-pipeline
```

4. Authenticate yourself.
```bash
$ gcloud auth login --no-launch-browser
$ gcloud auth application-default login --no-launch-browser
```

5. Create a scratch directory for Caper. Any subdirectories under `/srv/scratch` will inherit permissions from their parent directory.
```bash
$ sudo mkdir /srv/scratch
$ sudo chown $USER:$USER /srv/scratch
$ sudo chmod 777 /srv/scratch
$ sudo setfacl -d -m u::rwx /srv/scratch
$ sudo setfacl -d -m g::rwx /srv/scratch
$ sudo setfacl -d -m o::rwx /srv/scratch
```

6. Create a Caper configuration file, which will be shared with all users.
```bash
$ touch /opt/code/default.conf
```

7. Edit the shared configuration file `/opt/code/default.conf`. You can comment settings for the ENCODE workshop 2019 and uncomment/define your own `gcp-prj`, `tmp-gcs-bucket` and `out-gcs-bucket`.
```bash
[defaults]
cromwell=/opt/code/cromwell-42.jar
java-heap-server=8G

backend=gcp

out-dir=/srv/scratch/caper_out
tmp-dir=/srv/scratch/caper_tmp

#gcp-prj=[YOUR_GOOGLE_PROJECT]
gcp-prj=encode-workshop

#out-gcs-bucket=[YOUR_OUTPUT_BUCKET_FOR_CAPER]
#tmp-gcs-bucket=[YOUR_TMP_BUCKET_FOR_CAPER]
out-gcs-bucket=gs://encode-workshop-outputs/caper_out
tmp-gcs-bucket=gs://encode-workshop-outputs/caper_tmp

max-concurrent-workflows=100
```

8. Download Cromwell 42 JAR and share it with all users.
```bash
$ cd /opt/code
$ wget https://github.com/broadinstitute/cromwell/releases/download/42/cromwell-42.jar
```

9. Soft-link a shared configuration file.
```bash
$ mkdir -p ~/.caper && cd ~/.caper
$ ln -s /opt/code/default.conf default.conf
```

10. Create Caper's output bucket `gs://encode-workshop-outputs`.

11. Make the bucket public by adding a `Storage Object Viewer` role for `allUsers` to the bucket. This will allow public HTTP access to all files on the bucket, which will be used to visualize some of pipeline outputs (e.g. bigwigs) on a genome browser.

12. Give write permission to **ALL WORKSHOP PARTICIPANTS* (not for all public users). Add `Storage Object Creator` role to all participants. This is to give all participants write access to Caper tmp directory `gs://encode-workshop-outputs/caper_tmp` so that `--deepcopy` does not make duplicate files on the shared bucket. This will also give them write access to `gs://encode-workshop-outputs/croo` so that their organized outputs generates from Croo will be writte on that bucket directory.

13. Run a Caper server.
```bash
$ caper server
```
4 changes: 4 additions & 0 deletions docs/conf_gcp.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,3 +26,7 @@
```bash
$ gcloud config set project [YOUR_PROJECT_NAME]
```

# Setting up a Caper server instance

You will find [this](./conf_encode_workshop_2019.md) useful to set up your own Caper server on Google Cloud Platform.

0 comments on commit 90da36d

Please sign in to comment.