diff --git a/pcgrr/vignettes/installation.Rmd b/pcgrr/vignettes/installation.Rmd index faa1a58b..632a7d09 100644 --- a/pcgrr/vignettes/installation.Rmd +++ b/pcgrr/vignettes/installation.Rmd @@ -1,13 +1,18 @@ --- title: "Installation" output: rmarkdown::html_document + --- +```{r setup, include=FALSE} +knitr::opts_chunk$set(comment = "", collapse = TRUE) +``` + + ```{r load_pkgs, include=FALSE, echo=FALSE, message=FALSE, warning=FALSE} require(glue, include.only = "glue") ``` - ```{r vars, echo=FALSE} Sys.setenv(VEP_VERSION = "112") Sys.setenv(PCGR_VERSION = "1.4.1.9014") @@ -18,50 +23,48 @@ BUNDLE_VERSION <- Sys.getenv("BUNDLE_VERSION") ``` ```{r funcs, echo=FALSE} -bundle_link <- function(v, hg) { - glue("[{hg} - {v}](https://insilico.hpc.uio.no/pcgr/pcgr_ref_data.{v}.{hg}.tgz)") +bundle_link <- function(hg) { + v <- BUNDLE_VERSION + glue("https://insilico.hpc.uio.no/pcgr/pcgr_ref_data.{v}.{hg}.tgz") } ``` -The PCGR workflow has several data requirements and software installation options. - -- Data requirements: - - Sample-specific inputs (e.g. somatic variant calls in VCF format) - - Reference bundle (e.g. CIViC, CGI, TCGA) - - Ensembl VEP data cache +## Data -- Software options: - - Conda - - Docker - - Singularity/Apptainer +PCGR requires the following data: -## Data +- Sample-specific inputs (e.g. somatic variant calls in VCF format) +- Reference bundle (e.g. CIViC, CGI, TCGA) +- Ensembl VEP data cache -PCGR supports GRCh37 and GRCh38 sample-specific inputs. The reference bundle and -VEP data cache need to match the chosen human genome assembly. +PCGR supports the GRCh37 and GRCh38 human genome assemblies. All the data above +need to match the chosen assembly. ### 1. Reference Bundle -Reference bundles are generated semi-automatically by the author and versioned -based on their release date. Keep in mind that the bundles support only certain -Ensembl VEP versions. The genome-specific bundle is available from below (size: ~5G): - -- `r bundle_link(v = BUNDLE_VERSION, hg = "grch37")` -- `r bundle_link(v = BUNDLE_VERSION, hg = "grch38")` +Reference bundles are generated semi-automatically (by the PCGR author) and +are versioned based on their release date. Keep in mind that the bundles support +only certain Ensembl VEP versions. The genome-specific bundles +(**v`r BUNDLE_VERSION`**) can be downloaded directly from below (size: ~5G): -**Tip**: The `data/grch3x/.PCGR_BUNDLE_VERSION` file indicates the bundle version. +| Assembly | Download Link | +|----------|---------------------------| +| GRCh38 | `r bundle_link("grch38")` | +| GRCh37 | `r bundle_link("grch37")` | -
-Bash example +**Tip**: The `data/grch3x/.PCGR_BUNDLE_VERSION` file within the downloaded bundle +indicates the bundle version for reporting purposes. +#### Bash Example +```{bash echo=FALSE} +echo "BUNDLE_VERSION=\"${BUNDLE_VERSION}\"" +``` -```{bash eval=FALSE} +```bash GENOME="grch38" # or "grch37" -BUNDLE_VERSION="20240612" BUNDLE="pcgr_ref_data.${BUNDLE_VERSION}.${GENOME}.tgz" - wget https://insilico.hpc.uio.no/pcgr/${BUNDLE} gzip -dc ${BUNDLE} | tar xvf - @@ -69,13 +72,11 @@ mkdir ${BUNDLE_VERSION} mv data/ ${BUNDLE_VERSION} ``` -
- ### 2. VEP Cache -Ensembl [VEP][vep-web] requires a data cache which is available from the Ensembl +[VEP][vep-web] requires a data cache which is available from the Ensembl [FTP site][ensembl-ftp] (search there for files starting with `homo_sapiens_vep_`). -We currently support Ensembl VEP version `112`. +We currently support Ensembl VEP **v`r VEP_VERSION`**. **Tip**: PCGR needs to be pointed to the _parent_ directory containing the downloaded `homo_sapiens/xyz_GRCh3x/` cache, which is usually called `.vep` if @@ -85,32 +86,80 @@ you've followed the VEP cache [download instructions][vep-cache]. [ensembl-ftp]: https://ftp.ensembl.org/pub/release-112/variation/indexed_vep_cache/ [vep-cache]: https://asia.ensembl.org/info/docs/tools/vep/script/vep_cache.html#cache -- Bash example: +#### Bash Example + +```{bash echo=FALSE} +echo "VEP_VERSION=\"${VEP_VERSION}\"" +``` ```bash GENOME="GRCh38" # or "GRCh37" -VEP_VERSION="112" CACHE="homo_sapiens_vep_${VEP_VERSION}_${GENOME}.tar.gz" wget https://ftp.ensembl.org/pub/release-${VEP_VERSION}/variation/indexed_vep_cache/${CACHE} gzip -dc ${CACHE} | tar xvf - ``` +----------------------------- + ## Software -The PCGR workflow can be installed using [Conda][conda-web], [Docker][docker-web], -or [Singularity/Apptainer][apptainer-web]. +The PCGR workflow can be installed using [Docker][docker-web], +[Singularity/Apptainer][apptainer-web] or [Conda][conda-web]. [conda-web]: https://conda.io/projects/conda/en/latest/user-guide/getting-started.html [docker-web]: https://docs.docker.com/ [apptainer-web]: https://apptainer.org/docs/user/latest/index.html -### Conda +### A. Docker + +The Docker image is available on [DockerHub](https://hub.docker.com/r/sigven/pcgr/tags). +Pull the latest **v`r PCGR_VERSION`** image with: + +```{r echo=FALSE} +glue("docker pull sigven/pcgr:{PCGR_VERSION}") +# might need to specify platform +# docker pull --platform=amd64 sigven/pcgr:${PCGR_VERSION} +``` + +#### Example Run + +```bash +docker container run -it --rm \ + -v /Users/you/projects/.vep:/mnt/.vep + -v /Users/you/projects/bundle:/mnt/bundle \ + -v /Users/you/projects/pcgr_inputs:/mnt/pcgr_inputs \ + -v /Users/you/projects/pcgr_outputs:/mnt/pcgr_outputs \ + sigven/pcgr:1.4.1.9014 \ + pcgr \ + --input_vcf "/mnt/pcgr_inputs/tumor_sample.BRCA.vcf.gz" \ + --vep_dir "/mnt/.vep" \ + --refdata_dir "/mnt/bundle" \ + --output_dir "/mnt/pcgr_outputs" \ + --genome_assembly "grch38" \ + --sample_id "SampleB" \ + --assay "WGS" \ + --vcf2maf +``` + +### B. Singularity/Apptainer + +```{r echo=FALSE} +glue("apptainer pull oras://ghcr.io/sigven/pcgr:{PCGR_VERSION}.singularity") +``` + + -There is conda support for both Linux and macOS machines: +### C. Conda -
-Linux +There is Conda support for both Linux and macOS machines. +The following process can take anywhere from 10 up to 40 minutes when installing +from scratch, mostly depending on the user's and server's internet connection. +Most of the time is spent on downloading the `{BSgenome.Hsapiens.UCSC.hg19}` and +`{BSgenome.Hsapiens.UCSC.hg38}` R packages (which happens at the very end of the +conda environment creation). + +#### Linux ```bash # set up variables @@ -127,12 +176,9 @@ conda activate ./pcgr_conda/pcgr pcgr --version ``` -
- -
-macOS +#### macOS -For macOS M1 machines, you need to have `CONDA_SUBDIR=osx-64` before the +For macOS M1 machines, you need to include `CONDA_SUBDIR=osx-64` before the `conda create` command - see : @@ -150,180 +196,3 @@ conda activate ./pcgr_conda/pcgr # test that it works pcgr --version ``` - -
- -### Docker - -See the [Docker setup](#dockersetup) section for more details. - -```bash -PCGR_VERSION="1.4.1.9014" -docker pull sigven/pcgr:${PCGR_VERSION} -# might need to specify platform -# docker pull --platform=amd64 sigven/pcgr:${PCGR_VERSION} -``` - -### Singularity/Apptainer - -```bash -PCGR_VERSION="1.4.1.9014" -apptainer pull oras://ghcr.io/sigven/pcgr:${PCGR_VERSION}.singularity -``` - -
-
-
- - - - -### STEP 2: Set up Conda or Docker - -Step 2 depends on if you want to use Conda or Docker: - -- For Conda, continue reading the [PCGR Conda setup](#condasetup). -- For Docker, skip to the [PCGR Docker setup](#dockersetup). - - - -### Option 1: Conda - -#### a) Miniconda and conda - -Download and install the Miniconda installer from : - -- Make sure to download the Linux or MacOSX script according to which platform you're currently on. -- Run `bash miniconda.sh` and follow the prompts (it should be okay to accept the defaults, unless you want to choose a different - installation location than the default `~/miniconda3`). -- Exit your current terminal session and open a new one. You should now notice something like a `(base)` string as a - prefix in your terminal prompt. This means that you're in the `base` conda environment, and you're ready to start - installing the conda environments for PCGR. - -```bash -PLATFORM="MacOSX" # or "Linux" -MINICONDA_URL="https://repo.continuum.io/miniconda/Miniconda3-latest-${PLATFORM}-x86_64.sh" -wget ${MINICONDA_URL} -O miniconda.sh && chmod +x miniconda.sh -bash miniconda.sh -``` - -```text -# exit terminal and open new one - you should now see: - -# as of May 2024 -(base) $ conda --version -conda 24.5.0 -``` - -#### b) Create PCGR conda environments - -The `conda/env/lock` directory in the PCGR codebase contains two `.lock` files which -can be used to create the required conda environments for the Python component -(`pcgr`) and the R components (`pcgrr` (and `cpsr`)). We install the conda -dependencies for these two environments in a local `conda` directory in the -following example: - -```bash -cd /Users/you/dir4/conda -PLATFORM="osx-64" # or "linux-64" - -PCGR_VERSION="1.4.1.9014" -PCGR_REPO="https://raw.githubusercontent.com/sigven/pcgr/v${PCGR_VERSION}/conda/env/lock/" -PLATFORM="linux" # or "osx" - -conda create --prefix ./pcgr --file ${PCGR_REPO}/pcgr-${PLATFORM}-64.lock -conda create --prefix ./pcgrr --file ${PCGR_REPO}/pcgrr-${PLATFORM}-64.lock - -## Alternatively, for installing in your central conda directory, use the following: -# conda create --name pcgr --file ${PCGR_CONDA_ENV_DIR}/lock/pcgr-${PLATFORM}.lock -# conda create --name pcgrr --file ${PCGR_CONDA_ENV_DIR}/lock/pcgrr-${PLATFORM}.lock - -## For MacOS M1, you need to have 'CONDA_SUBDIR=osx-64' before the conda command, i.e.: -# CONDA_SUBDIR=osx-64 conda create --prefix [...] --file [...] -``` - -The above process takes 20-30 minutes when installing from scratch. Most of the time -is spent on downloading the -{BSgenome.Hsapiens.UCSC.hg19} and {BSgenome.Hsapiens.UCSC.hg38} R packages -(and yes, for simplicity we download both packages). -In the end, confirm your conda environments have been installed correctly -(notice how the paths are different to the `base` env installation after using the -`--prefix` option above): - -```text -$ (base) conda env list -# conda environments: -# -base * /Users/you/miniconda3 -pcgr /Users/you/dir4/conda/pcgr -pcgrr /Users/you/dir4/conda/pcgrr -``` - -#### c) Activate pcgr conda environment - -You need to activate the `conda/pcgr` conda environment, and test that it works -correctly with e.g. `pcgr --version`: - -```text -$ cd /Users/you/dir4/conda -(base) $ conda activate ./conda/pcgr -# note how the full path to the locally installed conda environment is now displayed - -(/Users/you/dir4/conda) $ which pcgr -/Users/you/dir4/conda/pcgr/bin/pcgr - -(/Users/you/dir4/conda) $ pcgr --version -pcgr X.X.X - -(/Users/you/dir4/conda) $ which pcgrr.R -/Users/you/dir4/conda/pcgr/bin/pcgrr.R -``` - -You should now be all set up to run PCGR! Continue on to [an example run](running.html#example-run). - - - -### Option 2: Docker - -#### a) Install Docker - -For installing Docker, follow the instructions at -for your Linux or MacOSX machine. - -#### b) Download PCGR Docker Image - -- Pull the [PCGR Docker image](https://hub.docker.com/r/sigven/pcgr/tags) from - DockerHub with: `docker pull sigven/pcgr:X.X.X` - -#### c) Run PCGR Docker Container - -If you are familiar with working with Docker volumes () -you can run PCGR using Docker instead of conda using the `-v :` Docker option. -You'll need to map your PCGR inputs to Docker container paths. - -For example, say you have the input VCF `sampleX.vcf.gz` stored in the -directory `/Users/you/project1`. You would need to supply Docker with a -`--volume` (or `-v`) option mapping the directory of that VCF with -a directory inside the Docker container, e.g. `/home/input_vcf_dir`. -That would become: `-v /Users/you/project1:/home/input_vcf_dir` -(note the `:` separating your directory from the container's directory). - -Then your command would look something like this: - -```bash -docker container run -it --rm \ - -v /Users/you/dir0/vep:/root/vep - -v /Users/you/dir1/data:/root/pcgr_refdata \ - -v /Users/you/dir2/pcgr_inputs:/root/pcgr_inputs \ - -v /Users/you/dir3/pcgr_outputs:/root/pcgr_outputs \ - sigven/pcgr:1.4.1.9014 \ - pcgr \ - --input_vcf "/root/pcgr_inputs/tumor_sample.BRCA.vcf.gz" \ - --vep_dir "/root/vep/.vep" \ - --refdata_dir "/root/pcgr_refdata" \ - --output_dir "/root/pcgr_outputs" \ - --genome_assembly "grch38" \ - --sample_id "SampleB" \ - --assay "WGS" \ - --vcf2maf -```