-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WiP] add nextflow-draft #45
Conversation
pre: "<b>⁃ </b>" | ||
--- | ||
|
||
The content of this labs were orgininally created by the nice folks at [Seqera Labs](https://github.com/seqeralabs/nextflow-tutorial), all credits go to them! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typos: lab instead of labs, origininally
Can you specify what you added or modified generally? just mentioning they "created it originally" is not clear and it might seem like you are just re-hosting their workshop here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added more color to that one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The content of this workshop is derived from a tutorial created by the nice folks at Seqera Labs, kudos to them!
We won't create or own pipelines and tweak code, but rather jump right in with a small proof-of-concept piepline, which we will run locally in containers, submit locally to AWS Batch and run a batch job that submits to AWS Batch.
|
||
|
||
## Overview | ||
During this tutorial you will implement a proof of concept of a RNA-seq pipeline. The goal of this workshop is not te become a Bioinformatician nor a Nextflow guru. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An RNA-seq
The goal is not "..." so what is it? to learn about underlying compute with Nextflow/Batch or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During this workshop you will implement a proof of concept of a RNA-seq pipeline. The goal of this workshop is not te become a Bioinformatician nor a Nextflow guru, but to get familiar with the concepts of nextflow and AWS Batch.
## Introduction | ||
|
||
### Conventions: | ||
Throughout this labs, we provide commands for you to run in the terminal. These commands will look like this: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Labs = lab? workshop? tutorial? in the same page it's called many different things. stick with workshop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's go with workshop
=================================== | ||
transcriptome: /home/ec2-user/environment/nextflow-tutorial/data/ggal/transcriptome.fa | ||
reads : /home/ec2-user/environment/nextflow-tutorial/data/ggal/gut_{1,2}.fq | ||
outdir : results |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after this line I got "WARN: Unable to create AWS Batch helper class | credentials cannot be null"
and after same time it failed Oops .. something went wrong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, let's have someone else go through it with the updates - works on my cloud. :)
129b860
to
dc79f09
Compare
typo on explains (exmplains)
More content for the conclusion
Suggestions to guide and allow copy/paste of value
Update _index.md
update the role to attach to the instance
typo in code
Merged a bunch of improvements from @plample, thanks a lot! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kniec , The workshop did work well, there is mainly recommendations and cosmetic stuff required for completion, but the reason why instead of approving I'm requesting changes is the cleanup sequence. The clean-up command clean up all the images in the registry all the job definitions in AWS Batch, etc. We should not assume we can clear out all of those, users may have already applications running in their accounts and this may cause an inconvenience. We should instead target the focus cleanup to the right resources that were created during the workshop
During this workshop you will implement a proof of concept of a RNA-seq pipeline. The goal of this workshop is not te become a Bioinformatician nor a Nextflow guru, but to get familiar with the concepts of nextflow and AWS Batch. | ||
|
||
{{% notice info %}} | ||
The estimated cost for running this **Y hour** workshop will be less than **$X**. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still does make a reference to Y hours and $X Dollars
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put in 1.5h and $5 (will check within the next dry-run).
weight: 45 | ||
--- | ||
|
||
## Attach the IAM role to your Workspace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This title seems misplaced as the copy&paste from the previous exercise, there might be some text lacking, explaining what does it mean the things the png is pointing; The picture refers to a set of steps that may need to be described (reference to steps 1 , 2, 3)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming that unlike the EKS and with Kubectl in this workshop we don't need to refresh credentials , but I'd suggest adding a validation entry where people have to run
aws sts get-caller-identity
To validate that the workshop credentials are what you expected to run. This also will help to AWS SA's attending the workshop to understand if someone skipped a step
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, during the workshop on purpose I left enabled the AWS managed temporary credentials and it came back to the credentials of my user creating cloud-9 instead of the ones from the acquired Cloud-9 role. Definitely more instructions are needed if it takes more than 60mins to do the workshop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the headline and included get-caller-identity. good catch!
|
||
## Install Java and Nextflow | ||
|
||
The nextflow command-line tool uses the JVM. Thus, we will install AWS open-source variant [Amazon Corretto](https://docs.aws.amazon.com/corretto). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps adding a quick {% Notice %} with a bit of info of why coretto is so cool and why we prefer to use it rather than other JDK's may bring attention to stuff that our AWS teams are doing :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the abstract from their website
|
||
### Nextflow | ||
|
||
Installing Nextflow using the online installer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this stage we are installing nextflow but we have not introduced why nextflow is so cool and what makes it a great tool to manage genomic flows. For example explaining to other why DSL's make pipeline and workflow declaration more effective, etc. Perhaps the link is further in the workshop, but I reckon at this stage a narrative on introduction that explain why nextflow would help to specific users that want to do this workshop and understand some of the concepts they will see later on. (concepts (a) Nextflow, what is it (b) DSLs and Pipelines/workflows (c)why this is required in genomics and more generally).
Narrative does not necessarily need to be huge, but pointers to back up concepts will definitely help
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a destinct 'install nextflow' page with some praises
|
||
A process is defined by providing three main declarations: the process [inputs](https://www.nextflow.io/docs/latest/process.html#inputs), the process [outputs](https://www.nextflow.io/docs/latest/process.html#outputs) and finally the command [script](https://www.nextflow.io/docs/latest/process.html#script). | ||
|
||
The example workflow implements a simple RNA-seq pipeline which: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A decoration of what RNA-Seq means with references will help a lot for people to get context. https://en.wikipedia.org/wiki/RNA-Seq
Think the workshop is also public and there will be other people with little context of genomic domain that actually are doing the workshop to understand this type of things, Help them with visual aids . If anything refer at least to wikipedia for people to have fun :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@plample can you help here?
|
||
## Follow Up | ||
|
||
<div style="text-align:left;font-size: 1.2rem"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
#### ECR | ||
|
||
``` | ||
for x in $(aws ecr --region=${AWS_REGION} describe-repositories |jq -r '.repositories[] | .repositoryName' |xargs);do echo "# aws ecr --region=${AWS_REGION} delete-repository --force --repository-name=$x" ; aws ecr --region=${AWS_REGION} delete-repository --force --repository-name=$x;done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, this raises a flag for me; This will clean all the repositories in my account on that region. In my case it deleted all the other projects I had in that region. We can really mess up if people run things on their own accounts. I'd suggest we filter and only list based on a regex for the project
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made the clean up a bit more carefull and put a big warning at the top
##### Job Definitions | ||
|
||
```bash | ||
for jd in $(aws batch --region=${AWS_REGION} describe-job-definitions |jq -r '.jobDefinitions[] | .jobDefinitionArn' |xargs); do echo "# aws batch --region=${AWS_REGION} deregister-job-definition --job-definition=${jd}" ; aws batch --region=${AWS_REGION} deregister-job-definition --job-definition=${jd}; done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same applies here, this is ok for new accounts within AWS, we cannot assume that we can delete all job definitions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
Disable first... | ||
|
||
```bash | ||
for jq in $(aws batch --region=${AWS_REGION} describe-job-queues |jq -r '.jobQueues[] |.jobQueueName' |xargs);do echo "# aws batch --region=${AWS_REGION} update-job-queue --state=DISABLED --job-queue=${jq}" ; aws batch --region=${AWS_REGION} update-job-queue --state=DISABLED --job-queue=${jq};done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same applies here, we cannot destroy all the queues, we should destroy just the ones that we created!
|
||
##### Compute Environment | ||
|
||
Disable first... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the same here.
Should have created one bit commit before merging tho... |
First draft of a workshop that uses nextflow to submit jobs to AWS Batch.
TODO:
Run local version
To run the local version, please checkout the branch and just run
docker-compose up
locally.