Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hic opt #224

Merged
merged 49 commits into from
Feb 6, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
e124828
add minimap process
yumisims Jan 26, 2024
de1b4af
added minimap full processs
yumisims Jan 27, 2024
0d9da73
add filter to hic_mapping
yumisims Jan 28, 2024
bd47eef
add hic_minimap
yumisims Jan 29, 2024
9bdbf0b
add hic_mapping
yumisims Jan 29, 2024
cbc0966
add config
yumisims Jan 30, 2024
e6a78fe
add filter_five_end
yumisims Jan 30, 2024
5339147
Merge branch 'galaxy_dev' into hic_opt
yumisims Jan 30, 2024
ef3fc68
fixed linting
yumisims Jan 30, 2024
5ca1a3d
change to spaces
yumisims Jan 30, 2024
4f67f70
invoke java.math
yumisims Jan 30, 2024
3403195
avoid juicer for rapid
yumisims Jan 30, 2024
f5a16fa
avoid juicer for rapid
yumisims Jan 30, 2024
0eb7138
consolidate container
yumisims Jan 31, 2024
13af5ff
consolidate container
yumisims Jan 31, 2024
8b72884
switch to larger server to test
yumisims Jan 31, 2024
077d28c
switch to larger server to test
yumisims Jan 31, 2024
d81682f
Update ci.yml - increase the runner size
gq1 Jan 31, 2024
fc6fd5c
Switch back to default runner. We can switch back if necessary when t…
gq1 Jan 31, 2024
7694b17
add tmp folder for fastk
yumisims Feb 1, 2024
5e23363
changed modules
yumisims Feb 1, 2024
b129153
changed modules
yumisims Feb 1, 2024
58738b5
changed subset files
yumisims Feb 1, 2024
c339eae
using mulled
yumisims Feb 1, 2024
d854d25
add nf-core download to ci
yumisims Feb 2, 2024
ce00ea2
add nf-core download to ci
yumisims Feb 2, 2024
0b27d74
removed nf download
yumisims Feb 5, 2024
3e9f4a3
Testing something
DLBPointon Feb 5, 2024
f248117
Testing something
DLBPointon Feb 5, 2024
83d7c6f
Testing something
DLBPointon Feb 5, 2024
0b55949
Testing something
DLBPointon Feb 5, 2024
95a5d06
Testing something
DLBPointon Feb 5, 2024
edbd0e1
Testing something
DLBPointon Feb 5, 2024
5470921
Testing something
DLBPointon Feb 5, 2024
90eacac
Testing something
DLBPointon Feb 5, 2024
d5e7723
Testing something
DLBPointon Feb 5, 2024
4ccac12
Testing something
DLBPointon Feb 5, 2024
e1f1899
Merge pull request #233 from sanger-tol/dp24-testing
yumisims Feb 5, 2024
bb8584e
Updates to CI
DLBPointon Feb 6, 2024
4652cc1
Updates to CI
DLBPointon Feb 6, 2024
4099a14
Updates to CI
DLBPointon Feb 6, 2024
c793637
Updates to CI
DLBPointon Feb 6, 2024
674e304
Updates to CI
DLBPointon Feb 6, 2024
c1dd1a1
Updates to CI
DLBPointon Feb 6, 2024
d69d741
Updates to CI
DLBPointon Feb 6, 2024
f1e47f7
Updates to CI
DLBPointon Feb 6, 2024
39126d1
Attempting to get path
DLBPointon Feb 6, 2024
59c3227
Attempting to get path
DLBPointon Feb 6, 2024
291b055
Merge pull request #234 from sanger-tol/dp24_ci_updates
weaglesBio Feb 6, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 23 additions & 3 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,11 @@ jobs:
- "22.10.1"
- "latest-everything"
steps:
- name: Check out pipeline code
uses: actions/checkout@v3
- name: Get branch names
# Pulls the names of current branches in repo
# steps.branch-names.outputs.current_branch is used later and returns the name of the branch the PR is made FROM not to
id: branch-names
uses: tj-actions/branch-names@v8

- name: Install Nextflow
uses: nf-core/setup-nextflow@v1
Expand All @@ -45,6 +48,23 @@ jobs:
mkdir -p $NXF_SINGULARITY_CACHEDIR
mkdir -p $NXF_SINGULARITY_LIBRARYDIR

- name: Install Python
uses: actions/setup-python@v5
with:
python-version: "3.10"

- name: Install nf-core
run: |
pip install nf-core

- name: NF-Core Download - download singularity containers
# Forcibly download repo on active branch and download SINGULARITY containers into the CACHE dir if not found
# Must occur after singularity install or will crash trying to dl containers
# Zip up this fresh download and run the checked out version
run: |
nf-core download sanger-tol/treeval --revision ${{ steps.branch-names.outputs.current_branch }} --compress none -d --force --outdir sanger-treeval --container-cache-utilisation amend --container-system singularity
tree *

- name: Download Tiny test data
# Download A fungal test data set that is full enough to show some real output.
run: |
Expand All @@ -53,4 +73,4 @@ jobs:
- name: Singularity - Run FULL pipeline with test data
# Remember that you can parallelise this by using strategy.matrix
run: |
nextflow run ${GITHUB_WORKSPACE} -profile test_github,singularity --outdir ./Sing-Full
nextflow run sanger-treeval/${{ steps.branch-names.outputs.current_branch }}/main.nf -profile test_github,singularity --outdir ./Sing-Full
8 changes: 5 additions & 3 deletions assets/github_testing/TreeValTinyFullTest.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,11 @@ assembly:
reference_file: /home/runner/work/treeval/treeval/TreeValTinyData/assembly/draft/grTriPseu1.fa
assem_reads:
read_type: hifi
read_data: /home/runner/work/treeval/treeval/TreeValTinyData/genomic_data/pacbio/
hic_data: /home/runner/work/treeval/treeval/TreeValTinyData/genomic_data/hic-arima/
read_data: /home/runner/work/treeval/treeval/TreeValTinyData/genomic_data/pacbio
supplementary_data: path
hic_data:
hic_cram: /home/runner/work/treeval/treeval/TreeValTinyData/genomic_data/hic-arima/
hic_aligner: bwamem2
kmer_profile:
# kmer_length will act as input for kmer_read_cov fastk and as the name of folder in profile_dir
kmer_length: 31
Expand All @@ -28,7 +30,7 @@ intron:
telomere:
teloseq: TTAGGG
synteny:
synteny_path: /nfs/treeoflife-01/teams/tola/users/dp24/treeval/TreeValTinyData/synteny/
synteny_path: /home/runner/work/treeval/treeval/treeval/TreeValTinyData/synteny
synteny_genomes: "LaetiporusSulphureus"
busco:
lineages_path: /home/runner/work/treeval/treeval/TreeValTinyData/busco/subset/
Expand Down
4 changes: 3 additions & 1 deletion assets/local_testing/nxOscDF5033.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,10 @@ reference_file: /lustre/scratch123/tol/resources/treeval/treeval-testdata/TreeVa
assem_reads:
read_type: hifi
read_data: /lustre/scratch123/tol/resources/treeval/treeval-testdata/TreeValSmallData/Oscheius_DF5033/genomic_data/nxOscSpes1/pacbio/fasta/
hic_data: /lustre/scratch123/tol/resources/treeval/treeval-testdata/TreeValSmallData/Oscheius_DF5033/genomic_data/nxOscSpes1/hic-arima2/full/
supplementary_data: path
hic_data:
hic_cram: /lustre/scratch123/tol/resources/treeval/treeval-testdata/TreeValSmallData/Oscheius_DF5033/genomic_data/nxOscSpes1/hic-arima2/full/
hic_aligner: minimap2
kmer_profile:
# kmer_length will act as input for kmer_read_cov fastk and as the name of folder in profile_dir
kmer_length: 31
Expand Down
14 changes: 8 additions & 6 deletions assets/local_testing/nxOscSUBSET.yaml
Original file line number Diff line number Diff line change
@@ -1,28 +1,30 @@
assembly:
assem_level: scaffold
assem_version: 1
sample_id: OscheiusSUBSET
latin_name: to_provide_taxonomic_rank
defined_class: nematode
assem_version: 1
project_id: DTOL
reference_file: /lustre/scratch123/tol/resources/treeval/treeval-testdata/TreeValSmallData/Oscheius_SUBSET/assembly/draft/SUBSET_genome/Oscheius_SUBSET.fasta
assem_reads:
longread_type: hifi
longread_data: /lustre/scratch123/tol/resources/treeval/treeval-testdata/TreeValSmallData/Oscheius_SUBSET/genomic_data/pacbio/
hic_data: /lustre/scratch123/tol/resources/treeval/treeval-testdata/TreeValSmallData/Oscheius_DF5033/genomic_data/nxOscSpes1/hic-arima2/subset/
read_type: hifi
read_data: /lustre/scratch123/tol/resources/treeval/treeval-testdata/TreeValSmallData/Oscheius_SUBSET/genomic_data/pacbio/
supplementary_data: path
hic_data:
hic_cram: /lustre/scratch123/tol/resources/treeval/treeval-testdata/TreeValSmallData/Oscheius_DF5033/genomic_data/nxOscSpes1/hic-arima2/subset/
hic_aligner: minimap2
kmer_profile:
# kmer_length will act as input for kmer_read_cov fastk and as the name of folder in profile_dir
kmer_length: 31
dir: /lustre/scratch123/tol/resources/treeval/treeval-testdata/TreeValSmallData/Oscheius_DF5033/genomic_data/nxOscSpes1/pacbio/
alignment:
data_dir: /lustre/scratch123/tol/resources/treeval/gene_alignment_data/
common_name: "" # For future implementation (adding bee, wasp, ant etc)
geneset: "Gae_host.Gae"
geneset_id: "Gae_host.Gae"
#Path should end up looking like "{data_dir}{classT}/{common_name}/csv_data/{geneset}-data.csv"
self_comp:
motif_len: 0
mummer_chunk: 4
mummer_chunk: 10
intron:
size: "50k"
telomere:
Expand Down
1 change: 1 addition & 0 deletions bin/awk_filter_reads.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
awk 'BEGIN{OFS="\t"}{if($1 ~ /^\@/) {print($0)} else {$2=and($2,compl(2048)); print(substr($0,2))}}'
109 changes: 109 additions & 0 deletions bin/filter_five_end.pl
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
#!/usr/bin/perl
use strict;
use warnings;

my $prev_id = "";
my @five;
my @three;
my @unmap;
my @mid;
my @all;
my $counter = 0;

while (<STDIN>){
chomp;
if (/^@/){
print $_."\n";
next;
}
my ($id, $flag, $chr_from, $loc_from, $mapq, $cigar, $d1, $d2, $d3, $read, $read_qual, @rest) = split /\t/;
my $bin = reverse(dec2bin($flag));
my @binary = split(//,$bin);
if ($prev_id ne $id && $prev_id ne ""){
if ($counter == 1){
if (@five == 1){
print $five[0]."\n";
}
else{
my ($id_1, $flag_1, $chr_from_1, $loc_from_1, $mapq_1, $cigar_1, $d1_1, $d2_1, $d3_1, $read_1, $read_qual_1, @rest_1) = split /\t/, $all[0];
my $bin_1 = reverse(dec2bin($flag_1));
my @binary_1 = split(//,$bin_1);
$binary_1[2] = 1;
my $bin_1_new = reverse(join("",@binary_1));
my $flag_1_new = bin2dec($bin_1_new);
print(join("\t",$id_1, $flag_1_new, $chr_from_1, $loc_from_1, $mapq_1, $cigar_1, $d1_1, $d2_1, $d3_1, $read_1, $read_qual_1, @rest_1)."\n");
}
}
elsif ($counter == 2 && @five == 1){
print $five[0]."\n";
}
else{
my ($id_1, $flag_1, $chr_from_1, $loc_from_1, $mapq_1, $cigar_1, $d1_1, $d2_1, $d3_1, $read_1, $read_qual_1, @rest_1) = split /\t/, $all[0];
my $bin_1 = reverse(dec2bin($flag_1));
my @binary_1 = split(//,$bin_1);
$binary_1[2] = 1;
my $bin_1_new = reverse(join("",@binary_1));
my $flag_1_new = bin2dec($bin_1_new);
print(join("\t",$id_1, $flag_1_new, $chr_from_1, $loc_from_1, $mapq_1, $cigar_1, $d1_1, $d2_1, $d3_1, $read_1, $read_qual_1, @rest_1)."\n");
}

$counter = 0;
undef @unmap;
undef @five;
undef @three;
undef @mid;
undef @all;
}

$counter++;
$prev_id = $id;
push @all,$_;
if ($binary[2]==1){
push @unmap,$_;
}
elsif ($binary[4]==0 && $cigar =~ m/^[0-9]*M/ || $binary[4]==1 && $cigar =~ m/.*M$/){
push @five, $_;
}
elsif ($binary[4]==1 && $cigar =~ m/^[0-9]*M/ || $binary[4]==0 && $cigar =~ m/.*M$/){
push @three, $_;
}
elsif ($cigar =~ m/^[0-9]*[HS].*M.*[HS]$/){
push @mid, $_;
}
}

if ($counter == 1){
if (@five == 1){
print $five[0]."\n";
}
else{
my ($id_1, $flag_1, $chr_from_1, $loc_from_1, $mapq_1, $cigar_1, $d1_1, $d2_1, $d3_1, $read_1, $read_qual_1, @rest_1) = split /\t/, $all[0];
my $bin_1 = reverse(dec2bin($flag_1));
my @binary_1 = split(//,$bin_1);
$binary_1[2] = 1;
my $bin_1_new = reverse(join("",@binary_1));
my $flag_1_new = bin2dec($bin_1_new);
print(join("\t",$id_1, $flag_1_new, $chr_from_1, $loc_from_1, $mapq_1, $cigar_1, $d1_1, $d2_1, $d3_1, $read_1, $read_qual_1, @rest_1)."\n");
}
}
elsif ($counter == 2 && @five == 1){
print $five[0]."\n";
}
else{
my ($id_1, $flag_1, $chr_from_1, $loc_from_1, $mapq_1, $cigar_1, $d1_1, $d2_1, $d3_1, $read_1, $read_qual_1, @rest_1) = split /\t/, $all[0];
my $bin_1 = reverse(dec2bin($flag_1));
my @binary_1 = split(//,$bin_1);
$binary_1[2] = 1;
my $bin_1_new = reverse(join("",@binary_1));
my $flag_1_new = bin2dec($bin_1_new);
print(join("\t",$id_1, $flag_1_new, $chr_from_1, $loc_from_1, $mapq_1, $cigar_1, $d1_1, $d2_1, $d3_1, $read_1, $read_qual_1, @rest_1)."\n");
}

sub dec2bin {
my $str = unpack("B32", pack("N", shift));
return $str;
}

sub bin2dec {
return unpack("N", pack("B32", substr("0" x 32 . shift, -32)));
}
10 changes: 10 additions & 0 deletions bin/grep_pg.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#!/bin/bash

# grep_pg.sh
# -------------------
# A shell script to exclude pg lines and label read 1 and read 2 from cram containers
#
# -------------------
# Author = yy5

grep -v "^\@PG" | awk '{if($1 ~ /^\@/) {print($0)} else {if(and($2,64)>0) {print(1$0)} else {print(2$0)}}}'
16 changes: 16 additions & 0 deletions conf/base.config
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,7 @@ process {

// RESOURCES: CHANGES TO FREQUENT FAILURES BELOW THIS MEM POINT
withName: '.*:.*:GENE_ALIGNMENT:.*:(MINIPROT_ALIGN|MINIMAP2_ALIGN)' {
cpus = { check_max( 6 * task.attempt, 'cpus' ) }
memory = { check_max( 50.GB * Math.ceil( task.attempt * 1.5 ) , 'memory' ) }
time = { check_max( 10.h * task.attempt, 'time' ) }
}
Expand Down Expand Up @@ -137,6 +138,11 @@ process {
memory = { check_max( 130.GB * task.attempt, 'memory' ) }
}

withName: CRAM_FILTER_MINIMAP2_FILTER5END_FIXMATE_SORT {
cpus = { check_max( 16 * 1, 'cpus' ) }
memory = { check_max( 130.GB * task.attempt, 'memory' ) }
}

withName: SNAPSHOT_SRES{
cpus = { check_max( 6 * 1, 'cpus' ) }
memory = { check_max( 4.GB * task.attempt, 'memory' ) }
Expand Down Expand Up @@ -181,6 +187,11 @@ process {
memory = { check_max( 1.GB * Math.ceil( 28 * fasta.size() / 1e+9 ) * task.attempt, 'memory' ) }
}

withName: MINIMAP2_INDEX {
cpus = { check_max( 2 * task.attempt, 'cpus' ) }
memory = { check_max( 1.GB * Math.ceil( 30 * fasta.size() / 1e+9 ) * task.attempt, 'memory' ) }
}

// add a cpus 16 if bam.size() >= 50GB
withName: '(SAMTOOLS_MARKDUP|BAMTOBED_SORT)' {
cpus = { check_max( 12 * 1, 'cpus' ) }
Expand All @@ -192,6 +203,11 @@ process {
memory = { check_max( 6.GB * task.attempt, 'memory' ) }
}

withName: MERQURYFK_MERQURYFK {
cpus = { check_max( 16 * 1, 'cpus' ) }
memory = { check_max( 100.GB * task.attempt, 'memory' ) }
}

withName: BUSCO {
cpus = { check_max( 16 * task.attempt, 'cpus' ) }
memory = { check_max( 50.GB * task.attempt, 'memory' ) }
Expand Down
19 changes: 17 additions & 2 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -327,20 +327,31 @@ process {
ext.prefix = { "${meta.id}_mkdup" }
}

withName: CRAM_FILTER_ALIGN_BWAMEM2_FIXMATE_SORT {
withName: ".*:.*:HIC_BWAMEM2:CRAM_FILTER_ALIGN_BWAMEM2_FIXMATE_SORT" {
ext.args = ""
ext.args1 = "-F0xB00 -nt"
ext.args2 = { "-5SPCp -H'${rglines}'" }
ext.args3 = "-mpu"
ext.args4 = { "--write-index -l1" }
}

withName: ".*:.*:HIC_MINIMAP2:CRAM_FILTER_MINIMAP2_FILTER5END_FIXMATE_SORT" {
ext.args = ""
ext.args1 = ""
ext.args2 = { "-ax sr" }
ext.args3 = "-mpu"
ext.args4 = { "--write-index -l1" }
}

withName: ".*:.*:GENERATE_GENOME:GNU_SORT" {
ext.prefix = { "${meta.id}" }
ext.suffix = { "genome" }
ext.args = { "-k2,2 -nr -S${task.memory.mega - 100}M -T ." }
}

withName: ".*:.*:HIC_MINIMAP2:MINIMAP2_INDEX" {
ext.args = { "${reference.size() > 2.5e9 ? (" -I " + Math.ceil(reference.size()/1e9)+"G") : ""} "}
}

//
// SUBWORKFLOW: KMER
Expand All @@ -349,7 +360,11 @@ process {
ext.prefix = { "${meta.id}_merged.fasta.gz" }
}

withName: FASTK_FASTK {
withName: ".*:.*:KMER_READ_COVERAGE:FASTK_FASTK" {
ext.args = "-k31 -t -P."
}

withName: ".*:.*:KMER:FASTK_FASTK" {
ext.args = "-k31 -t -P."
}

Expand Down
4 changes: 1 addition & 3 deletions modules/local/cram_filter_align_bwamem2_fixmate_sort.nf
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,7 @@ process CRAM_FILTER_ALIGN_BWAMEM2_FIXMATE_SORT {
tag "$meta.id"
label "process_high"

container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/mulled-v2-50d89b457e04ed90fa0cbf8ebc3ae1b9ffbc836b:caf993da1689e8d42f5e4c113ffc9ef81d26df96-0' :
'biocontainers/mulled-v2-50d89b457e04ed90fa0cbf8ebc3ae1b9ffbc836b:caf993da1689e8d42f5e4c113ffc9ef81d26df96-0' }"
container 'quay.io/sanger-tol/cramfilter_bwamem2_minimap2_samtools_perl:0.001-c1'

input:
tuple val(meta), path(cramfile), path(cramindex), val(from), val(to), val(base), val(chunkid), val(rglines), val(bwaprefix)
Expand Down
55 changes: 55 additions & 0 deletions modules/local/cram_filter_minimap2_filter5end_fixmate_sort.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
process CRAM_FILTER_MINIMAP2_FILTER5END_FIXMATE_SORT {
tag "$meta.id"
label "process_high"

container 'quay.io/sanger-tol/cramfilter_bwamem2_minimap2_samtools_perl:0.001-c1'

input:
tuple val(meta), path(cramfile), path(cramindex), val(from), val(to), val(base), val(chunkid), val(rglines), val(ref)

output:
tuple val(meta), path("*.bam"), emit: mappedbam
path "versions.yml" , emit: versions

when:
task.ext.when == null || task.ext.when

script:
def args = task.ext.args ?: ''
def args1 = task.ext.args1 ?: ''
def args2 = task.ext.args2 ?: ''
def args3 = task.ext.args3 ?: ''
def args4 = task.ext.args4 ?: ''
def prefix = task.ext.prefix ?: "${meta.id}"
"""
cram_filter -n ${from}-${to} ${cramfile} - | \\
samtools fastq ${args1} - | \\
minimap2 -t${task.cpus} -R '${rglines}' ${args2} ${ref} - | \\
grep_pg.sh | \\
filter_five_end.pl | \\
awk_filter_reads.sh | \\
samtools fixmate ${args3} - - | \\
samtools sort ${args4} -@${task.cpus} -T ${base}_${chunkid}_sort_tmp -o ${prefix}_${base}_${chunkid}_mm.bam -

cat <<-END_VERSIONS > versions.yml
"${task.process}":
samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//' )
minimap2: \$(minimap2 --version | sed 's/minimap2 //g')
END_VERSIONS
"""
// temp removal staden_io_lib: \$(echo \$(staden_io_lib --version 2>&1) | sed 's/^.*staden_io_lib //; s/Using.*\$//') CAUSES ERROR

stub:
def prefix = task.ext.prefix ?: "${meta.id}"
def base = "45022_3#2"
def chunkid = "1"
"""
touch ${prefix}_${base}_${chunkid}_mm.bam

cat <<-END_VERSIONS > versions.yml
"${task.process}":
samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//' )
minimap2: \$(echo \$(minimap2 version 2>&1) | sed 's/.* //')
END_VERSIONS
"""
}
Loading
Loading