Skip to content

Commit

Permalink
New multi-course structure for training materials
Browse files Browse the repository at this point in the history
First step toward evolving Hello Nextflow to be more modular. Just content moves and stub creation (not updated content yet except some rough chopping where bits are getting split up).
  • Loading branch information
vdauwera committed Dec 16, 2024
1 parent e36d4cb commit 877e069
Show file tree
Hide file tree
Showing 133 changed files with 806 additions and 421 deletions.
46 changes: 46 additions & 0 deletions docs/big_nextflow/01_orientation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# Orientation

The Gitpod environment contains all the software, code and data necessary to work through this training course, so you don't need to install anything yourself.
However, you do need a (free) account to log in, and you should take a few minutes to familiarize yourself with the interface.

If you have not yet done so, please follow [this link](../../envsetup/) before going any further.

## Materials provided

Throughout this training course, we'll be working in the `big-nextflow/` directory.
This directory contains all the code files, test data and accessory files you will need.

Feel free to explore the contents of this directory; the easiest way to do so is to use the file explorer on the left-hand side of the Gitpod workspace.
Alternatively, you can use the `tree` command.
Throughout the course, we use the output of `tree` to represent directory structure and contents in a readable form, sometimes with minor modifications for clarity.

Here we generate a table of contents to the second level down:

```bash
tree . -L 2
```

If you run this inside `big-nextflow`, you should see the following output: [TODO]

```console title="Directory contents"
.
```

!!!note

Don't worry if this seems like a lot; we'll go through the relevant pieces at each step of the course.
This is just meant to give you an overview.

**Here's a summary of what you should know to get started:**

[TODO]

!!!tip

If for whatever reason you move out of this directory, you can always run this command to return to it:

```bash
cd /workspace/gitpod/big-nextflow
```

Now, to begin the course, click on the arrow in the bottom right corner of this page.
31 changes: 31 additions & 0 deletions docs/big_nextflow/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
---
title: Big Nextflow
hide:
- toc
---

# Big Nextflow

[TODO]]

Let's get started!

[![Open in Gitpod](https://img.shields.io/badge/Gitpod-%20Open%20in%20Gitpod-908a85?logo=gitpod)](https://gitpod.io/#https://github.com/nextflow-io/training)

## Learning objectives

In this workshop, you will learn [TODO].

By the end of this workshop you will be able to:

[TODO]

## Audience & prerequisites

[TODO]

**Prerequisites**

- A GitHub account
- Experience with command line
[TODO]
193 changes: 4 additions & 189 deletions docs/hello_nextflow/03_hello_containers.md
Original file line number Diff line number Diff line change
Expand Up @@ -174,7 +174,7 @@ You know how to pull a container and run it interactively, make your data access

### What's next?

Learn how to get a container image for any pip/conda-installable tool.
[TODO] update text (was wrong one)

---

Expand All @@ -183,6 +183,8 @@ Learn how to get a container image for any pip/conda-installable tool.
Nextflow has built-in support for running processes inside containers to let you run tools you don't have installed in your compute environment.
This means that you can use any container image you like to run your processes, and Nextflow will take care of pulling the image, mounting the data, and running the process inside it.

[TODO] [Update this to add a cowsay step to the hello-world pipeline (just add it after the uppercase step) -- include passing in the character as a parameter]

### 2.1. Add a container directive to your process

Edit the `hello-containers.nf` script to add a `container` directive to the `cowsay` process.
Expand Down Expand Up @@ -281,191 +283,4 @@ You know how to use containers in Nextflow to run processes.

### What's next?

You have everything you need to continue to the [next chapter](./04_hello_genomics.md) of this training series.
Optionally, continue on to learn how to get container images for tools you want to use in your Nextflow pipelines.

---

## 3. Optional Topic: How to find or make container images

Some software developers provide container images for their software that are available on container registries like Docker Hub, but many do not.
In this optional section, we'll show you to two ways to get a container image for tools you want to use in your Nextflow pipelines: using Seqera Containers and building the container image yourself.

You'll be getting/building a container image for the `quote` pip package, which will be used in the exercise at the end of this section.

### 3.1. Get a container image from Seqera Containers

Seqera Containers is a free service that builds container images for pip and conda (including bioconda) installable tools.
Navigate to [Seqera Containers](https://www.seqera.io/containers/) and search for the `quote` pip package.

![Seqera Containers](img/seqera-containers-1.png)

Click on "+Add" and then "Get Container" to request a container image for the `quote` pip package.

![Seqera Containers](img/seqera-containers-2.png)

If this is the first time a community container has been built for this version of the package, it may take a few minutes to complete.
Click to copy the URI (e.g. `community.wave.seqera.io/library/pip_quote:ae07804021465ee9`) of the container image that was created for you.

You can now use the container image to run the `quote` command and get a random saying from Grace Hopper.

```bash
docker run --rm community.wave.seqera.io/library/pip_quote:ae07804021465ee9 quote "Grace Hopper"
```

Output:

```console title="Output"
Humans are allergic to change. They love to say, 'We've always done it
this way.' I try to fight that. That's why I have a clock on my wall
that runs counter-clockwise.
```

### 3.2. Build the container image yourself

Let's use some build details from the Seqera Containers website to build the container image for the `quote` pip package ourselves.
Return to the Seqera Containers website and click on the "Build Details" button.

The first item we'll look at is the `Dockerfile`, a type of script file that contains all the commands needed to build the container image.
We've added some explanatory comments to the Dockerfile below to help you understand what each part does.

```Dockerfile title="Dockerfile"
# Start from the micromamba base docker image
FROM mambaorg/micromamba:1.5.10-noble
# Copy the conda.yml file into the container
COPY --chown=$MAMBA_USER:$MAMBA_USER conda.yml /tmp/conda.yml
# Install various utilities for Nextflow to use and the packages in the conda.yml file
RUN micromamba install -y -n base -f /tmp/conda.yml \
&& micromamba install -y -n base conda-forge::procps-ng \
&& micromamba env export --name base --explicit > environment.lock \
&& echo ">> CONDA_LOCK_START" \
&& cat environment.lock \
&& echo "<< CONDA_LOCK_END" \
&& micromamba clean -a -y
# Run the container as the root user
USER root
# Set the PATH environment variable to include the micromamba installation directory
ENV PATH="$MAMBA_ROOT_PREFIX/bin:$PATH"
```

The second item we'll look at is the `conda.yml` file, which contains the list of packages that need to be installed in the container image.

```conda.yml title="conda.yml"
channels:
- conda-forge
- bioconda
dependencies:
- pip
- pip:
- quote==3.0.0 #
```

Copy the contents of these files into the stubs located in the `containers/build` directory, then run the following command to build the container image yourself.

!!! Note

We use the `-t quote:latest` flag to tag the container image with the name `quote` and the tag `latest`.
We will be able to use this tag to refer to the container image when running it on this system.

```bash
docker build -t quote:latest containers/build
```

After it has finished building, you can run the container image you just built.

```bash
docker run --rm quote:latest quote "Margaret Oakley Dayhoff"
```

### Takeaway

You've learned two different ways to get a container image for a tool you want to use in your Nextflow pipelines: using Seqera Containers and building the container image yourself.

### What's next?

You have everything you need to continue to the [next chapter](./04_hello_genomics.md) of this training series.
You can also continue on with an optional exercise to fetch quotes on computer/biology pioneers using the `quote` container and output them using the `cowsay` container.

---

## 4. Bonus Exercise: Make the cow quote famous scientists

This section contains some stretch exercises, to practice what you've learned so far.
Doing these exercises is _not required_ to understand later parts of the training, but provide a fun way to reinforce your learnings by figuring out how to make the cow quote famous scientists.

```console title="cowsay-output-Grace-Hopper.txt"
_________________________________________________
/ \
| Humans are allergic to change. They love to |
| say, 'We've always done it this way.' I try to fi |
| ght that. That's why I have a clock on my wall th |
| at runs counter-clockwise. |
| -Grace Hopper |
\ /
=================================================
\
\
^__^
(oo)\_______
(__)\ )\/\
||----w |
|| ||
```

### 4.1. Modify the `hello-containers.nf` script to use a getQuote process

We have a list of computer and biology pioneers in the `containers/data/pioneers.csv` file.
At a high level, to complete this exercise you will need to:

- Modify the default `params.input_file` to point to the `pioneers.csv` file.
- Create a `getQuote` process that uses the `quote` container to fetch a quote for each input.
- Connect the output of the `getQuote` process to the `cowsay` process to display the quote.

For the `quote` container image, you can either use the one you built yourself in the previous stretch exercise or use the one you got from Seqera Containers .

!!! Hint

A good choice for the `script` block of your getQuote process might be:
```groovy
script:
def safe_author = author.tokenize(' ').join('-')
"""
quote "$author" > quote-${safe_author}.txt
echo "-${author}" >> quote-${safe_author}.txt
"""
```

You can find a solution to this exercise in `containers/solutions/hello-containers-4.1.nf`.

### 4.2. Modify your Nextflow pipeline to allow it to execute in `quote` and `sayHello` modes.

Add some branching logic using to your pipeline to allow it to accept inputs intended for both `quote` and `sayHello`.
Here's an example of how to use an `if` statement in a Nextflow workflow:

```groovy title="hello-containers.nf"
workflow {
if (params.quote) {
...
}
else {
...
}
cowSay(text_ch)
}
```

!!! Hint

You can use `new_ch = processName.out` to assign a name to the output channel of a process.

You can find a solution to this exercise in `containers/solutions/hello-containers-4.2.nf`.

### Takeaway

You know how to use containers in Nextflow to run processes, and how to build some branching logic into your pipelines!

### What's next?

Celebrate, take a stretch break and drink some water!

When you are ready, move on to Part 3 of this training series to learn how to apply what you've learned so far to a more realistic data analysis use case.
[TODO]
Loading

0 comments on commit 877e069

Please sign in to comment.