diff --git a/README.md b/README.md index fab2a350..c2f2a9fc 100644 --- a/README.md +++ b/README.md @@ -13,19 +13,6 @@ **sanger-tol/blobtoolkit** is a bioinformatics pipeline that can be used to identify and analyse non-target DNA for eukaryotic genomes. It takes a samplesheet and aligned CRAM files as input, calculates genome statistics, coverage and completeness information, combines them in a TSV file by window size to create a BlobDir dataset and static plots. - - - - - - - - 1. Calculate genome statistics in windows ([`fastawindows`](https://github.com/tolkit/fasta_windows)) 2. Calculate Coverage ([`blobtk/depth`](https://github.com/blobtoolkit/blobtk)) 3. Fetch associated BUSCO lineages ([`goat/taxonsearch`](https://github.com/genomehubs/goat-cli)) @@ -44,9 +31,6 @@ > [!NOTE] > If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set-up Nextflow. Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) with `-profile test` before running the workflow on actual data. - - First, prepare a samplesheet with your input data that looks as follows: `samplesheet.csv`: @@ -58,12 +42,10 @@ mMelMel1,illumina,GCA_922984935.2.illumina.mMelMel1.cram mMelMel3,ont,GCA_922984935.2.ont.mMelMel3.cram ``` -Each row represents an aligned file. Rows with the same sample identifier are considered technical replicates. The datatype refers to the sequencing technology used to generate the underlying raw data and follows a controlled vocabulary (ont, hic, pacbio, pacbio_clr, illumina). The aligned read files can be generated using the [sanger-tol/readmapping](https://github.com/sanger-tol/readmapping) pipeline. +Each row represents an aligned file. Rows with the same sample identifier are considered technical replicates. The datatype refers to the sequencing technology used to generate the underlying raw data and follows a controlled vocabulary (`ont`, `hic`, `pacbio`, `pacbio_clr`, `illumina`). The aligned read files can be generated using the [sanger-tol/readmapping](https://github.com/sanger-tol/readmapping) pipeline. Now, you can run the pipeline using: - - ```bash nextflow run sanger-tol/blobtoolkit \ -profile \ @@ -86,7 +68,7 @@ For more details, please refer to the [usage documentation](https://pipelines.to ## Pipeline output - For more details about the output files and reports, please refer to the [output documentation](https://pipelines.tol.sanger.ac.uk/blobtoolkit/output). +For more details about the output files and reports, please refer to the [output documentation](https://pipelines.tol.sanger.ac.uk/blobtoolkit/output). ## Credits diff --git a/docs/output.md b/docs/output.md index ffa089a9..e6efe8bc 100644 --- a/docs/output.md +++ b/docs/output.md @@ -29,6 +29,8 @@ The files in the BlobDir dataset which is used to create the online interactive - `*.json`: files generated from genome and alignment coverage statistics - `*.png`: static plot images +More information about visualising the data in the [BlobToolKit repository](https://github.com/blobtoolkit/blobtoolkit/tree/main/src/viewer) + ### MultiQC diff --git a/docs/usage.md b/docs/usage.md index 143de417..dcec96e2 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -48,6 +48,11 @@ sample3,ont,ont.cram An [example samplesheet](assets/test/samplesheet.csv) has been provided with the pipeline. +### Support for [nf-core/fetchngs](https://nf-co.re/fetchngs) + +The pipeline can also accept a samplesheet generated by the [nf-core/fetchngs](https://nf-co.re/fetchngs) pipeline (tested with version 1.11.0). +The pipeline then needs the `--fetchngs_samplesheet true` option *and* `--align true`, since the data files would all be unaligned. + ## Getting databases ready for the pipeline The BlobToolKit pipeline can be run in many different ways. The default way requires access to several databases: @@ -91,7 +96,6 @@ Retrieve the NCBI blast nt database (version 5) files and tar gunzip them. We ar ```bash wget "ftp://ftp.ncbi.nlm.nih.gov/blast/db/v5/nt.???.tar.gz" -P $NT/ && -wget https://ftp.ncbi.nlm.nih.gov/blast/db/taxdb.tar.gz -P $NT && for file in $NT/*.tar.gz; do tar xf $file -C $NT && rm $file; done