-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pacbio align #63
Pacbio align #63
Conversation
…093b313e29ed68480d81d796cc1609536518ee5a and install all the related modules.
|
The profile
|
…file being combined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Some alignment files have
.fasta
in the name, likeGCA_947369205.1_OX376310.1_CANBKR010000003.1.fasta.pacbio.icCanRufa1_combined.cram
. I guess it's because.baseName
only removes the.gz
extension. Can you do something likefai.baseName - '.fasta'
(as insubworkflows/local/input_filter_split.nf
) ? - Not from this PR, but since we're talking about the naming convention, I would prefer not adding
_combined
- Bump
maxRetries
to 5 inconf/base.config
, until the resource requirements are backported from the readmapping pipeline
Thinking about the pipeline doing either the alignment sub-workflow or "samtools merge". There is actually a "samtools merge" in the alignment sub-workflow that does pretty much the same thing: align multiple aligned files from the same sample. Could there be just one "samtools merge" that works in both cases ? Also, in the alignment sub-workflow, "samtools merge" is followed by a bunch of samtools commands to extract some statistics and convert to CRAM. Here are my thoughts:
- For resource optimisation, we'll very likely have to run "samtools flagstat" on every input file to tune CPU/memory according to the read count
- It's maybe not a bad thing to extract all those stats from aligned files given as inputs. It may help with debugging and making sure the files are proper
- Is CRAM conversion happening twice ? There seems to be one in
convert_stats.nf
and one ininput_filter_split.nf
. Maybe we can remove the first one ?
In summary, and I'm happy to discuss that over a call if that's easier, here is what I would propose:
- Stop the alignment sub-workflow after minimap.
- Run samtools merge at this stage in both alignment and pre-aligned modes.
- The "convert stats" sub-workflow could be just "stats" and run samtools stats/flagstat/idxstat on all aligned files before input_filter_split.
wrong default value Co-authored-by: Matthieu Muffato <[email protected]>
All being updated now. |
Currently we tried to import 3 subworkflows from readmapping pipeline and didn't change much.
I suppose the aligned input bam/cram file will come with the stats file normally.
The first process only convert the aligned reads to CRAM and the second one will filter the reads as well. |
The latest results here from
|
We discussed this this morning.
|
FYI. Priyanka had the same idea for the read-mapping pipeline: sanger-tol/readmapping#63 |
PR checklist
nf-core lint
).nextflow run . -profile test,docker --outdir <OUTDIR>
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).