Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Main to dev merge #345

Closed
wants to merge 84 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
84 commits
Select commit Hold shift + click to select a range
43b4cf0
Merge branch 'main' into dev
DLBPointon Nov 15, 2024
f7d9db2
Merge pull request #307 from sanger-tol/dp24_jbrowse_only
DLBPointon Nov 15, 2024
cb5d998
Updates for modules, README and config
DLBPointon Nov 15, 2024
8aad21a
Add ktab to output
weaglesBio Nov 18, 2024
ca2afc4
Update documentation
weaglesBio Nov 19, 2024
52f5bf9
Merge pull request #330 from sanger-tol/dp24_steps
DLBPointon Nov 20, 2024
79bd32f
Updating SAMTOOLS
DLBPointon Nov 20, 2024
217bd32
linting
DLBPointon Nov 20, 2024
30a576c
Updating all easy to update modules
DLBPointon Nov 20, 2024
e93fd9a
Updating all easy to update modules
DLBPointon Nov 20, 2024
dc1e076
Updating all easy to update modules
DLBPointon Nov 20, 2024
3c660bb
Updating all easy to update modules
DLBPointon Nov 20, 2024
32152a5
Fixes to modules
DLBPointon Nov 20, 2024
1beeeb5
linting
DLBPointon Nov 20, 2024
ae7380c
Patches
DLBPointon Nov 21, 2024
27439f5
Merge branch 'dev' of https://github.com/sanger-tol/treeval into dev
DLBPointon Nov 21, 2024
4ad0c65
Adding steps to skip BUSCO
DLBPointon Nov 21, 2024
160af01
Pinning BUSCO to V5.5
DLBPointon Nov 21, 2024
55ed6d6
Removing steps var
DLBPointon Nov 21, 2024
0ada598
Linting
DLBPointon Nov 21, 2024
85217d9
Merge pull request #335 from sanger-tol/dp24_module_update_2
DLBPointon Nov 21, 2024
701b19b
Merge pull request #327 from sanger-tol/tola399_outputktab
DLBPointon Nov 21, 2024
9268b77
Minor updates
DLBPointon Nov 21, 2024
8356733
Resources review
weaglesBio Nov 21, 2024
7a0ca25
Merge pull request #333 from sanger-tol/dp24_module_update_samtools
yumisims Nov 21, 2024
b9600d9
Merge pull request #336 from sanger-tol/tola401_updateresources
yumisims Nov 21, 2024
0ecff8d
add binfile output and update pretext_graph
yumisims Nov 21, 2024
26a41a8
add binfile output and update pretext_graph
yumisims Nov 21, 2024
cc96248
Updating last of NF-core modules
DLBPointon Nov 22, 2024
107582a
Merge pull request #338 from sanger-tol/dev
DLBPointon Nov 22, 2024
b31d536
Updates
DLBPointon Nov 22, 2024
60a0642
Updates
DLBPointon Nov 22, 2024
f2a9c5b
update containter
yumisims Nov 25, 2024
0da07ee
Update to modules and changelogs to remove Anaconda
DLBPointon Nov 25, 2024
057252e
added bin file param
yumisims Nov 25, 2024
d768e47
PRETTIER
DLBPointon Nov 25, 2024
52ba726
PRETTIER
DLBPointon Nov 25, 2024
f080b97
Update modules.config for BUSCO_BUSCO
DLBPointon Nov 26, 2024
5043234
New module replaces chunkfasta
DLBPointon Nov 26, 2024
286c1ea
Updates to changelog and removing CHUNKFASTA, replaced with seqkit
DLBPointon Nov 26, 2024
a4b7ffd
Merge pull request #337 from sanger-tol/bin_pretextgraph
DLBPointon Nov 26, 2024
b268f62
Random Space
DLBPointon Nov 26, 2024
8cf822c
Re-add the assignment
DLBPointon Nov 26, 2024
370c8dc
Modify geneset/synteny .yaml to parse lists of absolute paths.
weaglesBio Nov 26, 2024
a9fa618
Update usage documentation for synteny and gene alignment
weaglesBio Nov 27, 2024
7f6693f
Nematode to fungi
DLBPointon Nov 27, 2024
20045d6
Merge pull request #339 from sanger-tol/dp24_module_update_3
weaglesBio Nov 27, 2024
4665285
Remove -M parameter from FastK to resolve segfault issue.
weaglesBio Nov 28, 2024
8043217
Prettier
weaglesBio Nov 28, 2024
439b9ec
Generate module path for fastk
weaglesBio Nov 28, 2024
96965a5
Merge branch 'dev' into tola406_genesetsyntenyyaml
weaglesBio Nov 28, 2024
9401126
Merge branch 'dev' into tola413_merquryfkfault
weaglesBio Nov 28, 2024
e81d671
Prettier
weaglesBio Nov 28, 2024
441c3e7
Updates
DLBPointon Nov 28, 2024
a002b54
Re-add
DLBPointon Nov 28, 2024
bca1939
merqury fix
DLBPointon Nov 28, 2024
ba1a458
Fudging the hash
DLBPointon Nov 29, 2024
d99fe23
Fudging the hash
DLBPointon Nov 29, 2024
b4b88df
Update modules again
DLBPointon Nov 29, 2024
163ffd6
Updates to Synteny to reflect Minimap2 update
DLBPointon Nov 29, 2024
9b78df1
Update yahs
weaglesBio Dec 2, 2024
b084e7c
Change to YAML_INPUT
DLBPointon Dec 4, 2024
e7e36d1
lint fix
weaglesBio Dec 4, 2024
c1e52e8
Change SETUP from 1 to 2.14
DLBPointon Dec 5, 2024
10212a8
Change SETUP from 1 to 2.14
DLBPointon Dec 5, 2024
781615e
Bump BAM2BED SORT to 2Gb as base
DLBPointon Dec 6, 2024
a929f2f
Adding VERSION information!
DLBPointon Dec 9, 2024
96f47fd
Making sure version information is printed
DLBPointon Dec 9, 2024
68a38a8
removing defaults
DLBPointon Dec 9, 2024
d87bed7
Update TreeValTinyFullTest.yaml
DLBPointon Dec 9, 2024
49f9ea5
Update TreeValTinyFullTest.yaml
DLBPointon Dec 9, 2024
4f211f7
Merge branch 'tola406_genesetsyntenyyaml' of https://github.com/sange…
DLBPointon Dec 9, 2024
18688d6
Updates for field value and PRETTER!
DLBPointon Dec 9, 2024
233b326
Updates
DLBPointon Dec 9, 2024
436cd12
Merge pull request #340 from sanger-tol/tola406_genesetsyntenyyaml
DLBPointon Dec 9, 2024
e018c0d
Merge remote-tracking branch 'origin/tola413_merquryfkfault' into tol…
DLBPointon Dec 9, 2024
0bdcf5d
Wills MerquryFK update
DLBPointon Dec 9, 2024
c55ea9b
Revert modules.json to previous PR
DLBPointon Dec 9, 2024
007cb12
Merge pull request #344 from sanger-tol/tola406_genesetsyntenyyaml
weaglesBio Dec 10, 2024
f5c6ef1
Merge branch 'dev' of https://github.com/sanger-tol/treeval into dev
DLBPointon Dec 10, 2024
7821b6c
merge conflict fix... hopefully
DLBPointon Dec 10, 2024
f4c9881
Left in some merge headers
DLBPointon Dec 10, 2024
f281eb1
Merge conflict and config addition
DLBPointon Dec 10, 2024
c10a9ab
Why did the container revert?
DLBPointon Dec 10, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Empty file modified .github/PULL_REQUEST_TEMPLATE.md
100644 → 100755
Empty file.
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ jobs:
uses: tj-actions/branch-names@v8

- name: Install Nextflow
uses: nf-core/setup-nextflow@v1
uses: nf-core/setup-nextflow@v2
with:
version: "${{ matrix.NXF_VER }}"

Expand Down
78 changes: 78 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,84 @@
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [1.2.0] - Ancient Destiny - [2024-11-15]

Our 3rd release for sanger-tol/treeval.

### Enhancements & Fixes

- Togglable subworkflows
- Adds a JBrowse Only workflow (this will lead to an update to the FULL workflow which can now call JBROWSE_ONLY and RAPID).
- Updates to containers (local modules) to remove Anaconda dependencies following policy changes.
- Updates to modules to remove Anaconda dependencies following policy changes
- The majority of these updates only remove the `default` channel from the environment.yml
- CONDA warnings for modules which cannot use CONDA.
- Removable of a liberal use of spaces.
- reformat_intersect was previously not outputing version data.
- Adding arch specification to Pretext GitHub actions runner. Hopefully this will stop the spurious errors we see on there.
- Addition of steps into schema.
- Adds \*ktab as an output.
- Updated singularity containers
- Added `--metaeuk` to BUSCO_BUSCO, default was causing pipeline errors on Actions -- Needs more investigation.
- Replaced Pyfasta split (depreciated 6 years ago) with Seqkit split which is frequently updated and very fast.

### Parameters

| Old Parameter | New Parameter |
| ------------- | ------------- |
| - | --steps |

### Software dependencies

Note, since the pipeline is using Nextflow DSL2, each process will be run with its own Biocontainer. This means that on occasion it is entirely possible for the pipeline to be using different versions of the same tool. However, the overall software dependency changes compared to the last release have been listed below for reference.

| Module | Old Version | New Versions |
| -------------------------------------- | ---------------- | ----------------- |
| bamtobed_sort ( bedtools + samtools ) | 2.31.0 + 1.17 | |
| bedtools | 2.31.1 | - |
| busco\* | 5.5.0 | - |
| bwa-mem2 | 2.2.1 | |
| cat | 2.3.4 | |
| chunk_fasta ( pyfasta ) | 0.5.2-1 | REMOVED |
| cooler | 0.9.2 | |
| cram_filter_align_bwamem2_fixmate_sort | - | |
| ^ ( samtools + bwamem2 ) ^ | 1.17 + 2.2.1 | |
| coreutils | 9.1 | |
| fastk | 1.0.1 | |
| gcc | 10.4.0 | |
| find_telomere_windows ( java-jdk ) | 8.0.112 | |
| generate_cram_csv ( samtools ) | 1.17 | |
| gnu-sort | 8.25 | 9.3 |
| juicer_tools_pre ( java-jdk ) | 8.0.112 | |
| perl | 5.26.2 | |
| merquryfk | 1.0.1 | |
| minimap2 + samtools | 2.24 + 1.14 | |
| minimap2_index | 2.24 | 2.28 |
| miniprot | 0.11--he4a0461_2 | |
| mummer | 3.23 | |
| paftools ( minimap2 + samtools ) | 2.24 + 1.14 | |
| pretextmap + samtools | 0.0.2 + 1.17 | 0.0.3 + 1.17 |
| python | 3.9 | - |
| - pandas | 1.5.2 | - |
| samtools | 1.18 | 1.21 |
| selfcomp_splitfasta ( perl-bioperl ) | 1.7.8-1 | |
| seqtk | 1.4 | |
| seqkit | ADDED | 2.9.0--h9ee0642_0 |
| tabix | 1.11 | |
| ucsc | 377 | 447 |
| windowmasker (blast) | 2.14.0 | 2.15.0 |

- busco is currently pinned to v5.5.0 - Upgrading v5.7.1 would cause github actions to crash. Further investigation needed.

## [1.1.1] - Ancient Aurora (H1) - [2024-04-26]

### Enhancements & Fixes

- Generate CRAM CSV fix to allow for multi-readgroup cram files
- Removing KMER_READCOV
- tmp directory was being used
- Output file adjustment (names and location)

## [1.1.0] - Ancient Aurora - [2024-04-26]

The second release for sanger-tol, created with the [nf-core](https://nf-co.re/) template.
Expand Down
4 changes: 2 additions & 2 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,9 +108,9 @@

## Software packaging/containerisation tools

- [Anaconda](https://anaconda.com)
- [Conda](https://conda.org/)

> Anaconda Software Distribution. 2016. Computer software. Vers. 2-2.4.0. Anaconda, Web.
> conda contributors. conda: A system-level, binary package and environment manager running on all major operating systems and platforms. Computer software. https://github.com/conda/conda

- [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/)

Expand Down
6 changes: 2 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
[![Cite with Zenodo](https://zenodo.org/badge/509096312.svg)](https://zenodo.org/doi/10.5281/zenodo.10047653)
[![Nextflow](https://img.shields.io/badge/nextflow%20DSL2-%E2%89%A522.10.1-23aa62.svg)](https://www.nextflow.io/)
[![run with conda](http://img.shields.io/badge/run%20with-conda-3EB049?labelColor=000000&logo=anaconda)](https://docs.conda.io/en/latest/)
[![run with conda](http://img.shields.io/badge/run%20with-conda-3EB049?labelColor=000000&logo=conda)](https://docs.conda.io/en/latest/)
[![run with docker](https://img.shields.io/badge/run%20with-docker-0db7ed?labelColor=000000&logo=docker)](https://www.docker.com/)
[![run with singularity](https://img.shields.io/badge/run%20with-singularity-1d355c.svg?labelColor=000000)](https://sylabs.io/docs/)
[![Launch on Nextflow Tower](https://img.shields.io/badge/Launch%20%F0%9F%9A%80-Nextflow%20Tower-%234256e7)](https://tower.nf/launch?pipeline=https://github.com/sanger-tol/treeval)

## Introduction

**sanger-tol/treeval [1.1.0 - Ancient Aurora]** is a bioinformatics best-practice analysis pipeline for the generation of data supplemental to the curation of reference quality genomes. This pipeline has been written to generate flat files compatible with [JBrowse2](https://jbrowse.org/jb2/) as well as HiC maps for use in Juicebox, PretextView and HiGlass.
**sanger-tol/treeval [1.2.0 - Ancient Destiny-]** is a bioinformatics best-practice analysis pipeline for the generation of data supplemental to the curation of reference quality genomes. This pipeline has been written to generate flat files compatible with [JBrowse2](https://jbrowse.org/jb2/) as well as HiC maps for use in Juicebox, PretextView and HiGlass.

The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible. The [Nextflow DSL2](https://www.nextflow.io/docs/latest/dsl2.html) implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies. Where possible, these processes have been submitted to and installed from [nf-core/modules](https://github.com/nf-core/modules) in order to make them available to all nf-core pipelines, and to everyone within the Nextflow community!

Expand Down Expand Up @@ -80,8 +80,6 @@ If you would like to contribute to this pipeline, please see the [contributing g

## Citations

<!--TODO: Citation-->

If you use sanger-tol/treeval for your analysis, please cite it using the following doi: [10.5281/zenodo.10047653](https://doi.org/10.5281/zenodo.10047653).

### Tools
Expand Down
9 changes: 3 additions & 6 deletions assets/github_testing/TreeValTinyFullTest.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,10 +19,8 @@ kmer_profile:
kmer_length: 31
dir: /home/runner/work/treeval/treeval/TreeValTinyData/
alignment:
data_dir: /home/runner/work/treeval/treeval/TreeValTinyData/gene_alignment_data/
common_name: "" # For future implementation (adding bee, wasp, ant etc)
geneset_id: "LaetiporusSulphureus.gfLaeSulp1"
#Path should end up looking like "{data_dir}{classT}/{common_name}/csv_data/{geneset}-data.csv"
genesets:
- /home/runner/work/treeval/treeval/TreeValTinyData/gene_alignment_data/fungi/csv_data/LaetiporusSulphureus.gfLaeSulp1-data.csv
self_comp:
motif_len: 0
mummer_chunk: 10
Expand All @@ -31,8 +29,7 @@ intron:
telomere:
teloseq: TTAGGG
synteny:
synteny_path: /home/runner/work/treeval/treeval/treeval/TreeValTinyData/synteny
synteny_genomes: "LaetiporusSulphureus"
- /home/runner/work/treeval/treeval/TreeValTinyData/synteny/fungi/LaetiporusSulphureus.fasta
busco:
lineages_path: /home/runner/work/treeval/treeval/TreeValTinyData/busco/subset/
lineage: fungi_odb10
7 changes: 3 additions & 4 deletions assets/local_testing/nxOscDF5033-BGA.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,8 @@ assem_reads:
hic: /workspace/treeval-curation/Oscheius_DF5033/hic-arima2/
supplementary: path # Not currently in use
alignment:
data_dir: /workspace/treeval-curation/gene_alignment_data/
geneset: "OscheiusTipulae.ASM1342590v1,CaenorhabditisElegans.WBcel235,Gae_host.Gae"
genesets:
- /lustre/scratch123/tol/resources/treeval/gene_alignment_data/nematode/csv_data/OscheiusTipulae.ASM1342590v1-data.csv
self_comp:
motif_len: 0
mummer_chunk: 10
Expand All @@ -21,8 +21,7 @@ intron:
telomere:
teloseq: TTAGGG
synteny:
synteny_path: /nfs/treeoflife-01/teams/tola/users/dp24/treeval/TreeValTinyData/synteny/
synteny_genomes: "LaetiporusSulphureus"
- /nfs/treeoflife-01/teams/tola/users/dp24/treeval/TreeValTinyData/synteny/fungi/LaetiporusSulphureus.fasta
busco:
lineages_path: /workspace/treeval-curation/busco/v5
lineage: nematoda_odb10
11 changes: 5 additions & 6 deletions assets/local_testing/nxOscDF5033.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,10 +19,10 @@ kmer_profile:
kmer_length: 31
dir: /lustre/scratch123/tol/resources/treeval/treeval-testdata/TreeValSmallData/Oscheius_DF5033/genomic_data/nxOscSpes1/pacbio/
alignment:
data_dir: /lustre/scratch123/tol/resources/treeval/gene_alignment_data/
common_name: "" # For future implementation (adding bee, wasp, ant etc)
geneset_id: "OscheiusTipulae.ASM1342590v1,CaenorhabditisElegans.WBcel235,Gae_host.Gae"
#Path should end up looking like "{data_dir}{classT}/{common_name}/csv_data/{geneset}-data.csv"
genesets:
- /lustre/scratch123/tol/resources/treeval/gene_alignment_data/nematode/csv_data/OscheiusTipulae.ASM1342590v1-data.csv
- /lustre/scratch123/tol/resources/treeval/gene_alignment_data/nematode/csv_data/CaenorhabditisElegans.WBcel235-data.csv
- /lustre/scratch123/tol/resources/treeval/gene_alignment_data/nematode/csv_data/Gae_host.Gae-data.csv
self_comp:
motif_len: 0
mummer_chunk: 10
Expand All @@ -31,8 +31,7 @@ intron:
telomere:
teloseq: TTAGGG
synteny:
synteny_path: /nfs/treeoflife-01/teams/tola/users/dp24/treeval/TreeValTinyData/synteny/
synteny_genomes: ""
- /lustre/scratch123/tol/resources/treeval/synteny/bird/bCucCan1.fasta
busco:
lineages_path: /lustre/scratch123/tol/resources/busco/v5
lineage: nematoda_odb10
10 changes: 4 additions & 6 deletions assets/local_testing/nxOscSUBSET.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,10 +19,8 @@ kmer_profile:
kmer_length: 31
dir: /lustre/scratch123/tol/resources/treeval/treeval-testdata/TreeValSmallData/Oscheius_DF5033/genomic_data/nxOscSpes1/pacbio/
alignment:
data_dir: /lustre/scratch123/tol/resources/treeval/gene_alignment_data/
common_name: "" # For future implementation (adding bee, wasp, ant etc)
geneset_id: "Gae_host.Gae"
#Path should end up looking like "{data_dir}{classT}/{common_name}/csv_data/{geneset}-data.csv"
genesets:
- /lustre/scratch123/tol/resources/treeval/gene_alignment_data/nematode/csv_data/OscheiusTipulae.ASM1342590v1-data.csv
self_comp:
motif_len: 0
mummer_chunk: 10
Expand All @@ -31,8 +29,8 @@ intron:
telomere:
teloseq: TTAGGG
synteny:
synteny_path: /nfs/treeoflife-01/teams/tola/users/dp24/treeval/TreeValTinyData/synteny/
synteny_genomes: ""
- /lustre/scratch123/tol/resources/treeval/synteny/bird/bCucCan1.fasta
- /lustre/scratch123/tol/resources/treeval/synteny/bird/bGalGal1.fasta
busco:
lineages_path: /lustre/scratch123/tol/resources/busco/v5
lineage: nematoda_odb10
Binary file removed bin/FKprof
Binary file not shown.
8 changes: 7 additions & 1 deletion bin/awk_filter_reads.sh
Original file line number Diff line number Diff line change
@@ -1 +1,7 @@
awk 'BEGIN{OFS="\t"}{if($1 ~ /^\@/) {print($0)} else {$2=and($2,compl(2048)); print(substr($0,2))}}'
version='1.0.0'
if [ $1 == '-v' ];
then
echo "$version"
else
awk 'BEGIN{OFS="\t"}{if($1 ~ /^\@/) {print($0)} else {$2=and($2,compl(2048)); print(substr($0,2))}}'
fi
9 changes: 8 additions & 1 deletion bin/bed_to_contacts.sh
Original file line number Diff line number Diff line change
@@ -1,2 +1,9 @@
#!/bin/bash
paste -d '\t' - - < $1 | awk 'BEGIN {FS="\t"; OFS="\t"} {if ($1 > $7) {print substr($4,1,length($4)-2),$12,$7,$8,"16",$6,$1,$2,"8",$11,$5} else {print substr($4,1,length($4)-2),$6,$1,$2,"8",$12,$7,$8,"16",$5,$11} }' | tr '\-+' '01' | sort -k3,3d -k7,7d | awk 'NF==11'

version='1.0.0'
if [ $1 == '-v' ];
then
echo "$version"
else
paste -d '\t' - - < $1 | awk 'BEGIN {FS="\t"; OFS="\t"} {if ($1 > $7) {print substr($4,1,length($4)-2),$12,$7,$8,"16",$6,$1,$2,"8",$11,$5} else {print substr($4,1,length($4)-2),$6,$1,$2,"8",$12,$7,$8,"16",$5,$11} }' | tr '\-+' '01' | sort -k3,3d -k7,7d | awk 'NF==11'
fi
Binary file removed bin/find_telomere
Binary file not shown.
5 changes: 5 additions & 0 deletions bin/generate_cram_csv.sh
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,11 @@ if [ -z "$1" ]; then
exit 1
fi

if [ $1 == "-v" ]; then
echo "1.0"
exit 1
fi

cram_path=$1
chunkn=0
outcsv=$2
Expand Down
2 changes: 1 addition & 1 deletion bin/get_avgcov.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
# Author = yy5
# -------------------
version='1.0.0'
if [ $1 == '-v'];
if [ $1 == '-v' ];
then
echo "$version"
else
Expand Down
9 changes: 7 additions & 2 deletions bin/get_busco_gene.sh
Original file line number Diff line number Diff line change
Expand Up @@ -11,5 +11,10 @@
# Update for BUSCO 5.5.0 - by we3
# Reorder start and end so smallest always second column. Also, trim range from scaffold name in first column.
# -------------------

cat $1| grep -v '#'|awk '$2!="Missing"'| awk '{if($4>$5){print $3"\t"$5"\t"$4"\t"$1"\t"$7"\t"$6"\t"$9}else{print $3"\t"$4"\t"$5"\t"$1"\t"$7"\t"$6"\t"$9}}'| awk -F'\t' -v OFS='\t' '{if($7==""){$7="no_orthodb_link"}; sub(/:.*/,"",$1);print $1,$2,$3,$4,$5,$6,$7}'
version='1.0.0'
if [ $1 == '-v' ];
then
echo "$version"
else
cat $1| grep -v '#'|awk '$2!="Missing"'| awk '{if($4>$5){print $3"\t"$5"\t"$4"\t"$1"\t"$7"\t"$6"\t"$9}else{print $3"\t"$4"\t"$5"\t"$1"\t"$7"\t"$6"\t"$9}}'| awk -F'\t' -v OFS='\t' '{if($7==""){$7="no_orthodb_link"}; sub(/:.*/,"",$1);print $1,$2,$3,$4,$5,$6,$7}'
fi
8 changes: 7 additions & 1 deletion bin/grep_pg.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,10 @@
# -------------------
# Author = yy5

grep -v "^\@PG" | awk '{if($1 ~ /^\@/) {print($0)} else {if(and($2,64)>0) {print(1$0)} else {print(2$0)}}}'
version='1.0.0'
if [ $1 == '-v' ];
then
echo "$version"
else
grep -v "^\@PG" | awk '{if($1 ~ /^\@/) {print($0)} else {if(and($2,64)>0) {print(1$0)} else {print(2$0)}}}'
fi
2 changes: 1 addition & 1 deletion bin/paf_to_bed.sh
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@

version='1.0.0'

if [ $1 == '-v'];
if [ $1 == '-v' ];
then
echo "$version"
else
Expand Down
27 changes: 16 additions & 11 deletions conf/base.config
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,7 @@ process {
withName:SAMTOOLS_MERGE {
cpus = { check_max( 16 * 1, 'cpus' ) }
memory = { check_max( 50.GB * task.attempt, 'memory') }
time = { check_max( 30.h * task.attempt, 'time') }
}

// RESOURCES: MEMORY INTENSIVE STEPS, SOFTWARE TO BE UPDATED TO COMBAT THIS
Expand Down Expand Up @@ -163,7 +164,7 @@ process {
}
withName: CRAM_FILTER_ALIGN_BWAMEM2_FIXMATE_SORT {
cpus = { check_max( 16 * 1 , 'cpus' ) }
memory = { check_max( 1.GB * ( reference.size() < 2e9 ? 50 : Math.ceil( ( reference.size() / 1e+9 ) * 20 ) * Math.ceil( task.attempt * 1 ) ) , 'memory') }
memory = { check_max( 1.GB * ( reference.size() < 2e9 ? 80 : Math.ceil( ( reference.size() / 1e+9 ) * 30 ) * Math.ceil( task.attempt * 1 ) ) , 'memory') }
}

withName: CRAM_FILTER_MINIMAP2_FILTER5END_FIXMATE_SORT {
Expand All @@ -179,11 +180,13 @@ process {
withName: PRETEXTMAP_STANDRD{
cpus = { check_max( 8 * 1, 'cpus' ) }
memory = { check_max( 3.GB * task.attempt, 'memory' ) }
time = { check_max( 1.h * ( ( fasta.size() < 4e9 ? 24 : 48 ) * Math.ceil( task.attempt * 1 ) ), 'time' ) }
}

withName: PRETEXTMAP_HIGHRES {
cpus = { check_max( 6 * task.attempt, 'cpus' ) }
memory = { check_max( 20.GB * Math.ceil( task.attempt * 2.6 ), 'memory' ) }
time = { check_max( 1.h * ( ( fasta.size() < 4e9 ? 24 : 48 ) * Math.ceil( task.attempt * 1 ) ), 'time' ) }
}

withName: PRETEXT_GRAPH {
Expand All @@ -207,7 +210,8 @@ process {
// add a cpus 16 if bam.size() >= 50GB
withName: BAMTOBED_SORT {
cpus = { check_max( 12 * 1, 'cpus' ) }
memory = { check_max( 3.GB * Math.ceil( bam.size() / 1e+9 ) * task.attempt, 'memory' ) }
memory = { check_max( 2.GB * ( ( bam.size() < 150e9 ? Math.ceil( bam.size() / 1e+9 ) : Math.ceil( bam.size() / 4e+9 ) ) * Math.ceil( task.attempt * 1 ) ), 'memory' ) }
time = { check_max( 30.h * task.attempt, 'time' ) }
}

withName: SAMTOOLS_MARKDUP {
Expand All @@ -217,22 +221,22 @@ process {

withName: COOLER_CLOAD {
cpus = { check_max( 16 * 1, 'cpus' ) }
memory = { check_max( 6.GB * task.attempt, 'memory' ) }
memory = { check_max( 20.GB * task.attempt, 'memory' ) }
}

withName: MERQURYFK_MERQURYFK {
cpus = { check_max( 20 * 1, 'cpus' ) }
memory = { check_max( 100.GB * task.attempt, 'memory' ) }
}

withName: BUSCO {
withName: BUSCO_BUSCO {
cpus = { check_max( 16 * task.attempt, 'cpus' ) }
memory = { check_max( 50.GB * task.attempt, 'memory' ) }
time = { check_max( 20.h * task.attempt, 'time' ) }
}

// Large Genomes > 4Gb
//withName: BUSCO {
//withName: BUSCO_BUSCO {
// cpus = { check_max( 30 * task.attempt, 'cpus' ) }
// memory = { check_max( 100.GB * task.attempt, 'memory' ) }
// time = { check_max( 300.h * task.attempt, 'time' ) }
Expand All @@ -244,12 +248,6 @@ process {
memory = { check_max( 100.GB * task.attempt, 'memory' ) }
}

withName: FKUTILS_FKPROF {
cpus = { check_max( 25 * task.attempt, 'cpus' ) }
memory = { check_max( 1.GB * ( reference.size() < 2e9 ? 50 : Math.ceil( ( reference.size() / 1e+9 ) * 20 ) * Math.ceil( task.attempt * 1 ) ), 'memory' ) }
time = { check_max( 36.h * task.attempt, 'time' ) }
}

//
// GENERAL MODULE LIMITS
// Based on reports from SummaryStats
Expand Down Expand Up @@ -278,6 +276,7 @@ process {
withName: GET_PAIRED_CONTACT_BED {
cpus = { check_max( ${ file.size() > 1e11 ? 12 : 6 } , 'cpus' ) }
memory = { check_max( 1.GB * Math.ceil( file.size() / 2e+9 ) * task.attempt , 'memory' ) }
time = { check_max( 30.h * task.attempt, 'time' ) }
}

//
Expand Down Expand Up @@ -308,4 +307,10 @@ process {
withName: BEDTOOLS_INTERSECT {
memory = { check_max( 6.GB * (task.attempt * task.attempt), 'memory' ) }
}

withName: GENERATE_CRAM_CSV {
cpus = { check_max( 6 , 'cpus' ) }
memory = { check_max( 30.GB * task.attempt , 'memory' ) }
time = { check_max( 10.h * task.attempt , 'time' ) }
}
}
Loading
Loading