diff --git a/docs/output.md b/docs/output.md index bec4a8c3..75100bc3 100755 --- a/docs/output.md +++ b/docs/output.md @@ -41,9 +41,9 @@ This subworkflow reads the input .yaml via the use of the built-in snakeyaml.Yam This workflow generates a .genome file which describes the base pair length of each scaffold in the reference genome. This is performed by [SAMTOOLS_FAIDX](https://nf-co.re/modules/samtools_faidx) to generate a .fai file. This index file is trimmed using local module [GENERATE_GENOME_FILE](../modules/local/generate_genome_file.nf) to output a .genome file. This file is then recycled into the workflow to be used by a number of other subworkflows. -![Generate genome workflow](images/treeval_1_0_generate_genome.jpeg) +![Generate genome workflow](https://raw.githubusercontent.com/sanger-tol/treeval/dev/docs/images/treeval_1_0_generate_genome.jpeg) -![Workflow Legend](images/treeval_1_0_legend.jpeg) +![Workflow Legend](https://raw.githubusercontent.com/sanger-tol/treeval/dev/docs/images/treeval_1_0_legend.jpeg) ### LONGREAD_COVERAGE @@ -74,9 +74,9 @@ This workflow generates a .genome file which describes the base pair length of e The GAP_FINDER subworkflow generates a bed file containing the genomic locations of the gaps in the sequence. This is performed by the use of [SEQTK_CUTN]() which cuts the input genome at sites of N (gaps). [GAP_LENGTH]() then calculates the lengths of gaps generates in the previous step, this file is injected into the hic_maps at a later stage. SEQTK's output bed file is then BGzipped and indexed by [TABIX_BGZIPTABIX](https://nf-co.re/modules/tabix_bgziptabix). -![Gap Finder workflow](images/treeval_1_0_gap_finder.jpeg) +![Gap Finder workflow](https://raw.githubusercontent.com/sanger-tol/treeval/dev/docs/images/treeval_1_0_gap_finder.jpeg) -![Workflow Legend](images/treeval_1_0_legend.jpeg) +![Workflow Legend](https://raw.githubusercontent.com/sanger-tol/treeval/dev/docs/images/treeval_1_0_legend.jpeg) ### REPEAT_DENSITY @@ -89,9 +89,9 @@ The GAP_FINDER subworkflow generates a bed file containing the genomic locations This uses [WindowMasker](https://github.com/goeckslab/WindowMasker) to mark potential repeats on the genome. The genome is chunked into 10kb bins which move along the entire genome as sliding windows in order to profile the repeat intensity. Bedtools is then used to intersect the bins and WindowMasker fragments. These fragments are then mapped back to the original assembly for visualization purposes. -![Repeat Density workflow](images/treeval_1_0_repeat_density.jpeg) +![Repeat Density workflow](https://raw.githubusercontent.com/sanger-tol/treeval/dev/docs/images/treeval_1_0_repeat_density.jpeg) -![Workflow Legend](images/treeval_1_0_legend.jpeg) +![Workflow Legend](https://raw.githubusercontent.com/sanger-tol/treeval/dev/docs/images/treeval_1_0_legend.jpeg) ### HIC_MAPPING @@ -106,9 +106,9 @@ This uses [WindowMasker](https://github.com/goeckslab/WindowMasker) to mark pote The HIC_MAPPING subworkflow takes a set of HiC read files in .cram format as input and derives HiC mapping outputs in .pretext, .hic, and .mcool formats. These outputs are used for visualization on [PretextView](https://github.com/wtsi-hpag/PretextView), [Juicebox](https://github.com/aidenlab/Juicebox), and [Higlass](https://github.com/higlass/higlass) respectively. -![Hic Mapping workflow](images/treeval_1_0_hic_mapping.jpeg) +![Hic Mapping workflow](https://raw.githubusercontent.com/sanger-tol/treeval/dev/docs/images/treeval_1_0_hic_mapping.jpeg) -![Workflow Legend](images/treeval_1_0_legend.jpeg) +![Workflow Legend](https://raw.githubusercontent.com/sanger-tol/treeval/dev/docs/images/treeval_1_0_legend.jpeg) ### TELO_FINDER @@ -125,9 +125,9 @@ The HIC_MAPPING subworkflow takes a set of HiC read files in .cram format as inp The TELO_FINDER subworkflow uses a supplied (by the .yaml) telomeric sequence to identify putative telomeric regions in the input genome. This is acheived via the use of [FIND_TELOMERE_REGIONS](../modules/local/find_telomere_regions.nf), the output of which is used to generate a telomere.windows file with [FIND_TELOMERE_WINDOWS](../modules/local/find_telomere_windows.nf) (Both of these modules utilise VGP derived telomere programs [found here](https://github.com/VGP/vgp-assembly/tree/master/pipeline/telomere)), data for each telomeric site is then extracted into bed format with [EXTRACT_TELO](../modules/local/extract_telo.nf) and finally BGZipped and indexed with [TABIX_BGZIPTABIX](https://nf-co.re/modules/tabix_bgziptabix/tabix_bgziptabix). -![Telomere Finder workflow](images/treeval_1_0_telo_finder.jpeg) +![Telomere Finder workflow](https://raw.githubusercontent.com/sanger-tol/treeval/dev/docs/images/treeval_1_0_telo_finder.jpeg) -![Workflow Legend](images/treeval_1_0_legend.jpeg) +![Workflow Legend](https://raw.githubusercontent.com/sanger-tol/treeval/dev/docs/images/treeval_1_0_legend.jpeg) ### BUSCO_ANALYSIS @@ -142,9 +142,9 @@ The TELO_FINDER subworkflow uses a supplied (by the .yaml) telomeric sequence to The BUSCO_ANNOTATION subworkflow takes an assembly genome as input and extracts a list of [BUSCO](https://gitlab.com/ezlab/busco) genes based on the BUSCO results obtained from BUSCO. Additionally, it provides an overlap BUSCO gene set based on a list of lepidoptera ancestral genes((Wright et al., 2023), which has been investigated by Charlotte Wright from Mark Blaxter's lab at the Sanger Institute. -![Busco analysis workflow](images/treeval_1_0_busco_analysis.jpeg) +![Busco analysis workflow](https://raw.githubusercontent.com/sanger-tol/treeval/dev/docs/images/treeval_1_0_busco_analysis.jpeg) -![Workflow Legend](images/treeval_1_0_legend.jpeg) +![Workflow Legend](https://raw.githubusercontent.com/sanger-tol/treeval/dev/docs/images/treeval_1_0_legend.jpeg) ### GENE_ALIGNMENT @@ -193,9 +193,9 @@ PUNCHLIST: Punchlists contain information on genes found to be duplicated (fully The INSILICO_DIGEST workflow is used to visualize the Bionano enzyme cutting sites for a genome FASTA file. It starts by identifying the recognition sequences of the labeling enzyme to create a CMAP file. This CMAP file is then converted into BED and BIGBED formats to provide visualizations of the Bionano enzyme cutting sites. This procedure generates data tracks based on three digestion enzymes: BSPQ1, BSSS1, and DLE1. -![Insilico digest workflow](images/treeval_1_0_insilico_digest.jpeg) +![Insilico digest workflow](https://raw.githubusercontent.com/sanger-tol/treeval/dev/docs/images/treeval_1_0_insilico_digest.jpeg) -![Workflow Legend](images/treeval_1_0_legend.jpeg) +![Workflow Legend](https://raw.githubusercontent.com/sanger-tol/treeval/dev/docs/images/treeval_1_0_legend.jpeg) ### SELFCOMP @@ -209,9 +209,9 @@ The INSILICO_DIGEST workflow is used to visualize the Bionano enzyme cutting sit he SELFCOMP subworkflow is a comparative genomics analysis originally performed by the Ensembl project. It involves comparing the genes and genomic sequences within a single species. The goal of the analysis is mainly to identify haplotypic duplications in a particular genome assembly. -![Selfcomp workflow](images/treeval_1_0_selfcomp.jpeg) +![Selfcomp workflow](https://raw.githubusercontent.com/sanger-tol/treeval/dev/docs/images/treeval_1_0_selfcomp.jpeg) -![Workflow Legend](images/treeval_1_0_legend.jpeg) +![Workflow Legend](https://raw.githubusercontent.com/sanger-tol/treeval/dev/docs/images/treeval_1_0_legend.jpeg) ### SYNTENY @@ -225,9 +225,9 @@ he SELFCOMP subworkflow is a comparative genomics analysis originally performed This worflows searches along predetermined path for syntenic genome files based on clade and then aligns with [MINIMAP2_ALIGN](https://nf-co.re/modules/minimap2_align) each to the reference genome, emitting an aligned .paf file for each. -![Synteny workflow](images/treeval_1_0_synteny.jpeg) +![Synteny workflow](https://raw.githubusercontent.com/sanger-tol/treeval/dev/docs/images/treeval_1_0_synteny.jpeg) -![Workflow Legend](images/treeval_1_0_legend.jpeg) +![Workflow Legend](https://raw.githubusercontent.com/sanger-tol/treeval/dev/docs/images/treeval_1_0_legend.jpeg) ### Pipeline information diff --git a/subworkflows/local/hic_mapping.nf b/subworkflows/local/hic_mapping.nf index 0d6d65c7..bf8df5bf 100644 --- a/subworkflows/local/hic_mapping.nf +++ b/subworkflows/local/hic_mapping.nf @@ -148,7 +148,7 @@ workflow HIC_MAPPING { ch_versions = ch_versions.mix ( SAMTOOLS_MARKDUP.out.versions.first() ) // - // MODULE: SAMTOOLS FILTER READS | BAMTOBED | SORT BED FILE + // MODULE: SAMTOOLS FILTER OUT DUPLICATE READS | BAMTOBED | SORT BED FILE // BAMTOBED_SORT( SAMTOOLS_MARKDUP.out.bam ) ch_versions = ch_versions.mix(BAMTOBED_SORT.out.versions) diff --git a/subworkflows/local/longread_coverage.nf b/subworkflows/local/longread_coverage.nf index 72c9b1b4..7af0214c 100755 --- a/subworkflows/local/longread_coverage.nf +++ b/subworkflows/local/longread_coverage.nf @@ -155,7 +155,7 @@ workflow LONGREAD_COVERAGE { .set { view_input } // - // MODULE: GET PRIMARY BAM + // MODULE: EXTRACT READS FOR PRIMARY ASSEMBLY // SAMTOOLS_VIEW( view_input.map { [it[0], it[1], it[3]] }, diff --git a/workflows/treeval_rapid.nf b/workflows/treeval_rapid.nf index 38aa67a3..dfea9bea 100644 --- a/workflows/treeval_rapid.nf +++ b/workflows/treeval_rapid.nf @@ -85,7 +85,7 @@ workflow TREEVAL_RAPID { // // SUBWORKFLOW: GENERATE TELOMERE WINDOW FILES WITH PACBIO READS AND REFERENCE // - TELO_FINDER ( GENERATE_GENOME.out.max_scaff_size + TELO_FINDER ( GENERATE_GENOME.out.max_scaff_size, GENERATE_GENOME.out.reference_tuple, YAML_INPUT.out.teloseq )