-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
issue in the lastz step #36
Comments
Hi Yawen, I am sorry for the prolonged silence, I've been on a rather intensive business trip. It seems like the issue might be related to the cluster jobs time limit; I'll need some more time to look into it. |
Hi @kirilenkobm , Thanks for your reply. That's alright. I'm glad to hear from you now. I also realized that this appears to be related to cluster runtime or compute resource limitations. The commands mentioned above were run by me on a large node of the cluster using |
Got it, some chromosomes containing many repeats may take much longer. |
If I try to reduce the chunk size for reference and query, should I use the parameters One more question, if I split the chromosome into smaller chunks, will it lead to a change in the result of calculating TOGA later. I mean if splitting into smaller chunks will it cause some genes to be identified as Missing if a gene happens to be located where it is split. |
We typically use 175Mb chunks for the reference and 50 Mb chunks for the query. This step only affects the lastz (all vs all local alignment) step. The downstream chaining step should give the same results. And TOGA uses these chains. So TOGA results should not be affected by chunksize. |
Thank you for your reply. You pointed out that the problem might be due to the result of unmasked repeats. this is how I got the result of masking, using Repeatmasker to soft mask the genome based on the library identified by Repeatmodeler. Then, in order to further mask the genome, I need to use Windowmasker on top of the softmasked genome just mentioned, right? My idea is to use Windowmasker first to further mask the genome, and if that doesn't work, I then reduce the chunksize. What do you think? Or just reduce chunksize |
Right, Windowmasker on top of the repeatMasker softmasked genome. You may only add WM for the scaffolds that cause problems. They likely have well assembled centromers, which are not correctly masked by RM. The quickest would be keeping the genome as is and reducing chunkSize. Simultaenously, you can run WM and use the additional masking if a smaller chunksize doesn't do it. |
Ok, thank you for your help! |
Hi,
I use this pipeline to create genome alignment file with local mode. Here is my script
make_chains.py human macAss hg38.2bit macAss.2bit --project_dir human_macAss --executor_queuesize 40
It ran fine at first with no problems. However, I've noticed that I seem to be experiencing some problems, reporting errors as follows
This is supposed to happen in the lastz step, and I noticed that these errors were reported after the
************ HgStepManager: executing step 'lastz' Sun Oct 15 09:08:53 2023.
hint.And there are 4 errors happening now,
what should I do to fix this?
Thank you in advance for your help.
Yawen
The text was updated successfully, but these errors were encountered: