Skip to content

Commit

Permalink
Merge pull request #3 from jaclyn-taroni/jaclyn-taroni/wilms-06-build…
Browse files Browse the repository at this point in the history
…-test

Build and test Docker image using `bioconductor/bioconductor_docker:3.19` and `renv`
  • Loading branch information
maud-p authored Aug 4, 2024
2 parents 3adc01b + 650df7b commit 47a84bc
Show file tree
Hide file tree
Showing 14 changed files with 2,150 additions and 579 deletions.
1 change: 1 addition & 0 deletions .github/workflows/docker_all-modules.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ jobs:
- simulate-sce
- cell-type-ewings
- doublet-detection
- cell-type-wilms-tumor-06
uses: ./.github/workflows/build-push-docker-module.yml
if: github.repository_owner == 'AlexsLemonade'
with:
Expand Down
55 changes: 55 additions & 0 deletions .github/workflows/docker_cell-type-wilms-tumor-06.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# This is a workflow to build the docker image for the cell-type-wilms-tumor-06 module
#
# Docker modules are run on pull requests when code for files that affect the Docker image have changed.
# If other files are used during the Docker build, they should be added to `paths`
#
# At module initialization, this workflow is inactive, and needs to be activated manually

name: Build docker image for cell-type-wilms-tumor-06

concurrency:
# only one run per branch at a time
group: "docker_cell-type-wilms-tumor-06_${{ github.ref }}"
cancel-in-progress: true

on:
pull_request:
branches:
- main
paths:
- "analyses/cell-type-wilms-tumor-06/Dockerfile"
- "analyses/cell-type-wilms-tumor-06/.dockerignore"
- "analyses/cell-type-wilms-tumor-06/renv.lock"
- "analyses/cell-type-wilms-tumor-06/conda-lock.yml"
workflow_dispatch:
inputs:
push-ecr:
description: "Push to AWS ECR"
type: boolean
required: true

jobs:
test-build:
name: Test Build Docker Image
if: github.event_name == 'pull_request' || (contains(github.event_name, 'workflow_') && !inputs.push-ecr)
runs-on: ubuntu-latest

steps:
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3

- name: Build image
uses: docker/build-push-action@v5
with:
context: "{{defaultContext}}:analyses/cell-type-wilms-tumor-06"
push: false
cache-from: type=gha
cache-to: type=gha,mode=max

build-push:
name: Build and Push Docker Image
if: github.repository_owner == 'AlexsLemonade' && (github.event_name == 'push' || inputs.push-ecr)
uses: ./.github/workflows/build-push-docker-module.yml
with:
module: "cell-type-wilms-tumor-06"
push-ecr: true
59 changes: 59 additions & 0 deletions .github/workflows/run_cell-type-wilms-tumor-06.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# This is a workflow to run the cell-type-wilms-tumor-06 module
#
# Analysis modules are run based on three triggers:
# - Manual trigger
# - On pull requests where code in the module has changed
# - As a reusable workflow called from a separate workflow which periodically runs all modules
#
# At initialization, only the manual trigger is active

name: Run cell-type-wilms-tumor-06 analysis module
env:
MODULE_PATH: analyses/cell-type-wilms-tumor-06
AWS_DEFAULT_REGION: us-east-2

concurrency:
# only one run per branch at a time
group: "run_cell-type-wilms-tumor-06_${{ github.ref }}"
cancel-in-progress: true

on:
workflow_dispatch:
# workflow_call:
# pull_request:
# branches:
# - main
# paths:
# - "analyses/cell-type-wilms-tumor-06/**"

jobs:
run-module:
if: github.repository_owner == 'AlexsLemonade'
runs-on: ubuntu-latest

steps:
- name: Checkout repo
uses: actions/checkout@v4

- name: Set up R
uses: r-lib/actions/setup-r@v2
with:
r-version: 4.4.0
use-public-rspm: true

- name: Set up pandoc
uses: r-lib/actions/setup-pandoc@v2

- name: Set up renv
uses: r-lib/actions/setup-renv@v2
with:
working-directory: ${{ env.MODULE_PATH }}

# Update this step as needed to download the desired data
- name: Download test data
run: ./download-data.py --test-data --format SCE

- name: Run analysis module
run: |
cd ${MODULE_PATH}
# run module script(s) here
4 changes: 4 additions & 0 deletions analyses/cell-type-wilms-tumor-06/.Rprofile
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Don't activate renv in an OpenScPCA docker image
if(Sys.getenv('OPENSCPCA_DOCKER') != 'TRUE'){
source('renv/activate.R')
}
4 changes: 4 additions & 0 deletions analyses/cell-type-wilms-tumor-06/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,7 @@
# Ignore the scratch directory (but keep it present)
/scratch/*
!/scratch/.gitkeep

# Ignore Docker-related files specific to module author's system
config.yaml
run.sh
39 changes: 24 additions & 15 deletions analyses/cell-type-wilms-tumor-06/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,19 +1,28 @@
# pull base image
FROM bioconductor/tidyverse:3.19
# Pull base image
# This image has RStudio Server on it
FROM bioconductor/bioconductor_docker:3.19

# Set global R options
RUN echo "options(repos = 'https://cloud.r-project.org')" > $(R --no-echo --no-save -e "cat(Sys.getenv('R_HOME'))")/etc/Rprofile.site
ENV RETICULATE_MINICONDA_ENABLED=FALSE
# Labels following the Open Containers Initiative (OCI) recommendations
# For more information, see https://specs.opencontainers.org/image-spec/annotations/?v=v1.0.1
LABEL org.opencontainers.image.title="openscpca/cell-type-wilms-tumor-06"
LABEL org.opencontainers.image.description="Docker image for the OpenScPCA analysis module 'cell-type-wilms-tumor-06'"
LABEL org.opencontainers.image.authors="OpenScPCA [email protected]"
LABEL org.opencontainers.image.source="https://github.com/AlexsLemonade/OpenScPCA-analysis/tree/main/analyses/cell-type-wilms-tumor-06"

RUN R --no-echo --no-restore --no-save -e "install.packages('remotes')"
# Set an environment variable to allow checking if we are in an OpenScPCA container
ENV OPENSCPCA_DOCKER=TRUE

RUN R -e "devtools::install_github('enblacar/SCpubr')"
RUN R -e "remotes::install_github('satijalab/seurat', 'seurat5', quiet = TRUE)" # this also install patchwork (and others)
RUN R -e "remotes::install_github('satijalab/azimuth', quiet = TRUE)" # this also install SingleCellExperiment, DT (and others)
RUN R -e "remotes::install_github('cancerbits/DElegate')"
RUN R -e "install.packages('viridis')"
RUN R -e "install.packages('ggplotify')"
RUN R -e "BiocManager::install('edgeR')"
# Disable the renv cache to install packages directly into the R library
ENV RENV_CONFIG_CACHE_ENABLED=FALSE

# make sure all R related binaries are in PATH in case we want to call them directly
ENV PATH ${R_HOME}/bin:$PATH
# Install renv
RUN R --no-echo --no-restore --no-save -e "install.packages('renv')"

# Copy the renv.lock file from the host environment to the image
COPY renv.lock renv.lock

# restore from renv.lock file and clean up to reduce image size
RUN Rscript -e 'renv::restore()' && \
rm -rf ~/.cache/R/renv && \
rm -rf /tmp/downloaded_packages && \
rm -rf /tmp/Rtmp*
57 changes: 43 additions & 14 deletions analyses/cell-type-wilms-tumor-06/README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# Wilms Tumor Dataset Annotation (SCPCP000006)
# Wilms Tumor Dataset Annotation (SCPCP000006)

Wilms tumor (WT) is the most common pediatric kidney cancer characterized by an exacerbated intra- and inter- tumor heterogeneity.
The genetic landscape of WT is very diverse in each of the histological contingents.
The COG classifies WT patients into two groups: the favorable histology and diffuse anaplasia.
Wilms tumor (WT) is the most common pediatric kidney cancer characterized by an exacerbated intra- and inter- tumor heterogeneity.
The genetic landscape of WT is very diverse in each of the histological contingents.
The COG classifies WT patients into two groups: the favorable histology and diffuse anaplasia.
Each of these groups is composed of the blastemal, epithelial, and stromal populations of cancer cells in different proportions, as well as cells from the normal kidney, mostly kidney epithelial cells, endothelial cells, immune cells and normal stromal cells (fibroblast).

## Description
Expand All @@ -24,9 +24,9 @@ The analysis is/will be divided as the following:
- [ ] Notebook: explore results from step 6, integrate all samples together and annotate the dataset using (i) metadatafile, (ii) CNV information, (iii) label transfer information

## Usage
From Rstudio, run the Rmd reports or render the R scripts (see below R studio session set up).
Please before running the script, make sure that the paths are correct.
You can also simply have a look at the html reports in the notebook folder.
From Rstudio, run the Rmd reports or render the R scripts (see below R studio session set up).
Please before running the script, make sure that the paths are correct.
You can also simply have a look at the html reports in the notebook folder.
Here, no need to run anything, we try to guide you through the analysis. Have a look at the code using the unhide code button on the top right of each chunk!

## Input files
Expand Down Expand Up @@ -57,7 +57,7 @@ Of note, this requires AWS CLI setup to run as intended: https://openscpca.readt

### sample metadata

The OpenScPCA-analysis/data/current/SCPCP000006/single_cell_metadata.tsv file contains clinical information related to the samples in the dataset.
The OpenScPCA-analysis/data/current/SCPCP000006/single_cell_metadata.tsv file contains clinical information related to the samples in the dataset.
Some information can be helpful for annotation and validation:

- treatment: Some of the samples have been pre-treated with chemotherapy and some are upfront resection.
Expand All @@ -68,7 +68,7 @@ Some differenices are expected, some marker genes or pathways are associated wit

## Output files

## Marker sets
## Marker sets

This folder is a resource for later validation of the annotated cell types.

Expand Down Expand Up @@ -109,7 +109,7 @@ This folder is a resource for later validation of the annotated cell types.
### The table GeneticAlterations_metadata.csv contains the following column and information:
- alteration contains the number and portion of the affected chromosome
- gain_loss contains the information regarding the gain or loss of the corresponding genetic alteration
- cell_class is "malignant"
- cell_class is "malignant"
- cell_type contains the list of the malignant cell types that are attributed to the marker gene, either blastemal, stromal, epithelial or NA if none of the three histology is more prone to the described genetic alteration
- DOI contains the list of main publication identifiers supporting the choice of the genetic alteration
- comment can be empty or contains any additional information
Expand All @@ -135,14 +135,43 @@ The main packages used are:
- DT for table visualization
- DElegate for differential expression analysis

For complete reproducibility of the results, you can build and run the docker image using the Dockerfile. This will allow you to work on RStudio (R version 4.4.1) from the based image bioconductor/tidyverse:3.19.
### Docker

In the config.yaml file, define your system specific parameter and paths (e.g. to the data).
Execute the run.sh file and open RStudio in your browser (http://localhost:8080/).
By default, username = rstudio, password = wordpass.
To build the Docker image, run the following from this directory:

```shell
docker buildx build . -t openscpca/cell-type-wilms-tumor-06
```

The image will also be available from ECR: <https://gallery.ecr.aws/openscpca/cell-type-wilms-tumor-06>

To run the container and develop in RStudio Server, run the following **from the root of the repository**, Replacing `{PASSWORD}`, including the curly braces, with a password of your choosing:

```shell
docker run \
--mount type=bind,target=/home/rstudio/OpenScPCA-analysis,source=$PWD \
-e PASSWORD={PASSWORD} \
-p 8787:8787 \
public.ecr.aws/openscpca/cell-type-wilms-tumor-06:latest
```

This will pull the latest version of the image from ECR if you do not yet have a copy locally.

Navigate to <http://localhost:8787/> and log in with the username `rstudio` and the password you set.

Within RStudio Server, `OpenScPCA-analysis` will point to your local copy of the repository.

#### A note on Apple Silicon

If you are on a Mac with an M series chip, you will not be able to use RStudio Server if you are using a `linux/amd64` or `linux/x86_84` (like the ones available from ECR).
You must build an ARM image locally to be able to use RStudio Server within the container.

### renv

This module uses `renv`.
If you are using RStudio Server within the container, the `renv` project will not be activated by default.
You can install packages within the container and use `renv::snapshot()` to update the lockfile without activating the project without a problem in our testing.
The `renv` lockfile is used to install R packages in the Docker image.

## Computational resources

Expand Down
8 changes: 8 additions & 0 deletions analyses/cell-type-wilms-tumor-06/components/dependencies.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Tidyverse
library(tidyverse)

# Single-cell packages
library(Seurat) # remotes::install_github("satijalab/[email protected]")
library(presto) # remotes::install_github("immunogenomics/presto")
library(Azimuth) # remotes::install_github("satijalab/azimuth")
library(SCpubr)
12 changes: 0 additions & 12 deletions analyses/cell-type-wilms-tumor-06/config.yaml

This file was deleted.

Loading

0 comments on commit 47a84bc

Please sign in to comment.