Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WiP] add nextflow-draft #45

Merged
merged 31 commits into from
Apr 27, 2020
Merged

[WiP] add nextflow-draft #45

merged 31 commits into from
Apr 27, 2020

Conversation

kniec
Copy link
Contributor

@kniec kniec commented Mar 30, 2020

First draft of a workshop that uses nextflow to submit jobs to AWS Batch.

TODO:

  • Create Batch Squared part, in which a job in AWS Batch is used to execute the supervisor nextflow process submitting more jobs to a different queue.

Run local version

To run the local version, please checkout the branch and just run docker-compose up locally.

@kniec kniec added the enhancement New feature or request label Mar 30, 2020
@kniec kniec requested a review from ruecarlo March 30, 2020 13:34
@ranshn ranshn self-requested a review March 31, 2020 12:45
pre: "<b>⁃ </b>"
---

The content of this labs were orgininally created by the nice folks at [Seqera Labs](https://github.com/seqeralabs/nextflow-tutorial), all credits go to them!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typos: lab instead of labs, origininally
Can you specify what you added or modified generally? just mentioning they "created it originally" is not clear and it might seem like you are just re-hosting their workshop here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added more color to that one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The content of this workshop is derived from a tutorial created by the nice folks at Seqera Labs, kudos to them!
We won't create or own pipelines and tweak code, but rather jump right in with a small proof-of-concept piepline, which we will run locally in containers, submit locally to AWS Batch and run a batch job that submits to AWS Batch.



## Overview
During this tutorial you will implement a proof of concept of a RNA-seq pipeline. The goal of this workshop is not te become a Bioinformatician nor a Nextflow guru.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An RNA-seq
The goal is not "..." so what is it? to learn about underlying compute with Nextflow/Batch or something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During this workshop you will implement a proof of concept of a RNA-seq pipeline. The goal of this workshop is not te become a Bioinformatician nor a Nextflow guru, but to get familiar with the concepts of nextflow and AWS Batch.

## Introduction

### Conventions:
Throughout this labs, we provide commands for you to run in the terminal. These commands will look like this:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Labs = lab? workshop? tutorial? in the same page it's called many different things. stick with workshop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's go with workshop

content/nextflow-on-aws-batch/10_prerequisites/_index.md Outdated Show resolved Hide resolved
content/nextflow-on-aws-batch/40_nextflow202/_index.md Outdated Show resolved Hide resolved
content/nextflow-on-aws-batch/40_nextflow202/10_setup.md Outdated Show resolved Hide resolved
content/nextflow-on-aws-batch/40_nextflow202/10_setup.md Outdated Show resolved Hide resolved
content/nextflow-on-aws-batch/40_nextflow202/10_setup.md Outdated Show resolved Hide resolved
===================================
transcriptome: /home/ec2-user/environment/nextflow-tutorial/data/ggal/transcriptome.fa
reads : /home/ec2-user/environment/nextflow-tutorial/data/ggal/gut_{1,2}.fq
outdir : results
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after this line I got "WARN: Unable to create AWS Batch helper class | credentials cannot be null"
and after same time it failed Oops .. something went wrong

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, let's have someone else go through it with the updates - works on my cloud. :)

@kniec kniec force-pushed the nextflow-workshop branch from 0f29d57 to 129b860 Compare April 8, 2020 08:25
@kniec kniec force-pushed the nextflow-workshop branch from 129b860 to dc79f09 Compare April 20, 2020 08:42
@kniec
Copy link
Contributor Author

kniec commented Apr 23, 2020

Merged a bunch of improvements from @plample, thanks a lot!

Copy link
Contributor

@ruecarlo ruecarlo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kniec , The workshop did work well, there is mainly recommendations and cosmetic stuff required for completion, but the reason why instead of approving I'm requesting changes is the cleanup sequence. The clean-up command clean up all the images in the registry all the job definitions in AWS Batch, etc. We should not assume we can clear out all of those, users may have already applications running in their accounts and this may cause an inconvenience. We should instead target the focus cleanup to the right resources that were created during the workshop

During this workshop you will implement a proof of concept of a RNA-seq pipeline. The goal of this workshop is not te become a Bioinformatician nor a Nextflow guru, but to get familiar with the concepts of nextflow and AWS Batch.

{{% notice info %}}
The estimated cost for running this **Y hour** workshop will be less than **$X**.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still does make a reference to Y hours and $X Dollars

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put in 1.5h and $5 (will check within the next dry-run).

weight: 45
---

## Attach the IAM role to your Workspace
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This title seems misplaced as the copy&paste from the previous exercise, there might be some text lacking, explaining what does it mean the things the png is pointing; The picture refers to a set of steps that may need to be described (reference to steps 1 , 2, 3)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming that unlike the EKS and with Kubectl in this workshop we don't need to refresh credentials , but I'd suggest adding a validation entry where people have to run

aws sts get-caller-identity

To validate that the workshop credentials are what you expected to run. This also will help to AWS SA's attending the workshop to understand if someone skipped a step

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, during the workshop on purpose I left enabled the AWS managed temporary credentials and it came back to the credentials of my user creating cloud-9 instead of the ones from the acquired Cloud-9 role. Definitely more instructions are needed if it takes more than 60mins to do the workshop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the headline and included get-caller-identity. good catch!


## Install Java and Nextflow

The nextflow command-line tool uses the JVM. Thus, we will install AWS open-source variant [Amazon Corretto](https://docs.aws.amazon.com/corretto).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps adding a quick {% Notice %} with a bit of info of why coretto is so cool and why we prefer to use it rather than other JDK's may bring attention to stuff that our AWS teams are doing :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the abstract from their website


### Nextflow

Installing Nextflow using the online installer.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this stage we are installing nextflow but we have not introduced why nextflow is so cool and what makes it a great tool to manage genomic flows. For example explaining to other why DSL's make pipeline and workflow declaration more effective, etc. Perhaps the link is further in the workshop, but I reckon at this stage a narrative on introduction that explain why nextflow would help to specific users that want to do this workshop and understand some of the concepts they will see later on. (concepts (a) Nextflow, what is it (b) DSLs and Pipelines/workflows (c)why this is required in genomics and more generally).

Narrative does not necessarily need to be huge, but pointers to back up concepts will definitely help

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a destinct 'install nextflow' page with some praises


A process is defined by providing three main declarations: the process [inputs](https://www.nextflow.io/docs/latest/process.html#inputs), the process [outputs](https://www.nextflow.io/docs/latest/process.html#outputs) and finally the command [script](https://www.nextflow.io/docs/latest/process.html#script).

The example workflow implements a simple RNA-seq pipeline which:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A decoration of what RNA-Seq means with references will help a lot for people to get context. https://en.wikipedia.org/wiki/RNA-Seq

Think the workshop is also public and there will be other people with little context of genomic domain that actually are doing the workshop to understand this type of things, Help them with visual aids . If anything refer at least to wikipedia for people to have fun :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@plample can you help here?


## Follow Up

<div style="text-align:left;font-size: 1.2rem">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

#### ECR

```
for x in $(aws ecr --region=${AWS_REGION} describe-repositories |jq -r '.repositories[] | .repositoryName' |xargs);do echo "# aws ecr --region=${AWS_REGION} delete-repository --force --repository-name=$x" ; aws ecr --region=${AWS_REGION} delete-repository --force --repository-name=$x;done
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, this raises a flag for me; This will clean all the repositories in my account on that region. In my case it deleted all the other projects I had in that region. We can really mess up if people run things on their own accounts. I'd suggest we filter and only list based on a regex for the project

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the clean up a bit more carefull and put a big warning at the top

##### Job Definitions

```bash
for jd in $(aws batch --region=${AWS_REGION} describe-job-definitions |jq -r '.jobDefinitions[] | .jobDefinitionArn' |xargs); do echo "# aws batch --region=${AWS_REGION} deregister-job-definition --job-definition=${jd}" ; aws batch --region=${AWS_REGION} deregister-job-definition --job-definition=${jd}; done
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same applies here, this is ok for new accounts within AWS, we cannot assume that we can delete all job definitions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Disable first...

```bash
for jq in $(aws batch --region=${AWS_REGION} describe-job-queues |jq -r '.jobQueues[] |.jobQueueName' |xargs);do echo "# aws batch --region=${AWS_REGION} update-job-queue --state=DISABLED --job-queue=${jq}" ; aws batch --region=${AWS_REGION} update-job-queue --state=DISABLED --job-queue=${jq};done
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same applies here, we cannot destroy all the queues, we should destroy just the ones that we created!


##### Compute Environment

Disable first...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the same here.

@ranshn ranshn self-requested a review April 27, 2020 09:52
@kniec
Copy link
Contributor Author

kniec commented Apr 27, 2020

Thanks @ruecarlo and @ranshn for the feedback - I'll merge the PR to get it ready for the first run on Wedensday

@kniec kniec merged commit d340e20 into awslabs:master Apr 27, 2020
@kniec kniec deleted the nextflow-workshop branch April 27, 2020 13:15
@kniec
Copy link
Contributor Author

kniec commented Apr 27, 2020

Should have created one bit commit before merging tho...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants