-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A minor discrepancy between v1.0.0 and v2.0.8 at 'fill_chain' step #67
Comments
Thx for reporting this. I just checked and we use a minScore of 25000 for all alis that we generate these days. But we have used 5000 before. Chains with a score <25000 are typically small and for most well-assembled genomes chains score 1+ million. If you want to maximize sensitivity, pls set the threshold to 5000 or lower. |
Thanks for sharing this. For "chain_run" step, we have been using the default The score I mentioned in the post above is the default values for "fill_chain" step. MLC v1 used Still, in v2.0.8, |
Yes, I understood. For repeatFiller, this would only mean that gaps in the chains below the score threshold would not be filled (but such short chains often have pretty much no substantial gaps) |
Thanks for the clarification. We will then continue to use 1000 and 5000 for the chaining ("chain_run") step, and 25000 for the repeatFiller ("fill_chain") step. |
Hi again,
Recently, we have been testing
make_lastz_chains
v2.0.8 (MLC v2) to run on our SGE grid. After adding someclusterOptions
directives to theNextflow
template (execute_joblist.nf
), the MLC v2 pipeline runs well.However, the alignment results have always been slightly different from those with
make_lastz_chains
v1.0.0 (MLC v1), so we compared the temporary job scripts from each step.One difference is that MLC v1 uses
--chainMinScore 25000
when runningchain_gap_filler.py
during the "fill_chain" step, while MLC v2 uses--chainMinScore 1000
or any other value given with--min_chain_score
when setting up the pipeline.MLC v2 accepts
--fill_chain_min_score
separately, with a default value 25000.But in the
fill_chain_step.py
code, it usesparam.chain_min_score
instead ofparam.fill_chain_min_score
(#24) when building job scripts that runchain_gap_fillter.py
:I wonder if this is intended.
Replacing
param.chain_min_score
withparam.fill_chain_min_score
appears to reduce the number of final alignments slightly (after post-processing) without affecting the alignment coverage of CDS....
There are also differences in how the target and query sequences were chunked and how sequences smaller than the chunk size were treated during the
lastz
step, but for this, I think what v2 does makes more sense than v1. :)Another difference is handling the
lastz_q
(orBLASTZ_Q
) parameter during the "chain_run" step. I will write about this in another issue.The text was updated successfully, but these errors were encountered: