Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issue in the lastz step #36

Closed
lnyawen opened this issue Oct 16, 2023 · 8 comments
Closed

issue in the lastz step #36

lnyawen opened this issue Oct 16, 2023 · 8 comments
Labels
documentation Improvements or additions to documentation nextflow issues Nextflow issues performance Everything related to the pipeline performance

Comments

@lnyawen
Copy link

lnyawen commented Oct 16, 2023

Hi,

I use this pipeline to create genome alignment file with local mode. Here is my script
make_chains.py human macAss hg38.2bit macAss.2bit --project_dir human_macAss --executor_queuesize 40

It ran fine at first with no problems. However, I've noticed that I seem to be experiencing some problems, reporting errors as follows

executor >  local (1158)
[cc/0029ca] process > execute_jobs (1225) [ 59%] 1118 of 1875

executor >  local (1159)
[3f/96e7bf] process > execute_jobs (1081) [ 59%] 1119 of 1875

executor >  local (1160)
[cf/920417] process > execute_jobs (1082) [ 59%] 1119 of 1875

executor >  local (1160)
[bb/e4ed9a] process > execute_jobs (26)   [ 59%] 1120 of 1876, failed: 1, ret...
[bb/e4ed9a] NOTE: Process `execute_jobs (26)` failed -- Execution is retried (1)

executor >  local (1161)
[91/7d94e9] process > execute_jobs (1083) [ 59%] 1121 of 1876, failed: 1, ret...
[bb/e4ed9a] NOTE: Process `execute_jobs (26)` failed -- Execution is retried (1)

executor >  local (1161)
[91/7d94e9] process > execute_jobs (1083) [ 59%] 1121 of 1876, failed: 1, ret...
[bb/e4ed9a] NOTE: Process `execute_jobs (26)` failed -- Execution is retried (1)
[78/143ab2] process > execute_jobs (1085) [ 59%] 1123 of 1876, failed: 1, ret...


executor >  local (1163)
[78/143ab2] process > execute_jobs (1085) [ 59%] 1123 of 1876, failed: 1, ret...

This is supposed to happen in the lastz step, and I noticed that these errors were reported after the ************ HgStepManager: executing step 'lastz' Sun Oct 15 09:08:53 2023. hint.

And there are 4 errors happening now,

[35/91e12f] process > execute_jobs (1394) [ 72%] 1370 of 1879, failed: 4, ret...

executor >  local (1411)
[64/039d77] process > execute_jobs (1395) [ 72%] 1371 of 1879, failed: 4, ret...

executor >  local (1411)
[64/039d77] process > execute_jobs (1395) [ 72%] 1371 of 1879, failed: 4, ret...

executor >  local (1412)
[c3/9f78eb] process > execute_jobs (1396) [ 73%] 1372 of 1879, failed: 4, ret...

what should I do to fix this?

Thank you in advance for your help.

Yawen

@kirilenkobm kirilenkobm added the nextflow issues Nextflow issues label Oct 16, 2023
@kirilenkobm
Copy link
Member

kirilenkobm commented Oct 26, 2023

Hi Yawen,

I am sorry for the prolonged silence, I've been on a rather intensive business trip. It seems like the issue might be related to the cluster jobs time limit; I'll need some more time to look into it.

@kirilenkobm kirilenkobm pinned this issue Oct 26, 2023
@lnyawen
Copy link
Author

lnyawen commented Oct 27, 2023

Hi @kirilenkobm ,

Thanks for your reply. That's alright. I'm glad to hear from you now.

I also realized that this appears to be related to cluster runtime or compute resource limitations. The commands mentioned above were run by me on a large node of the cluster using local mode. Also, I used the cluster mode, --executor pbs --executor_partition core28, and all the other tasks have run and finished very quickly, but I noticed that one task has been running, and this task automatically results and resubmits another task after two or three days of running, and with a similar error:
[23/0e5576] NOTE: Process execute_jobs (2393) terminated with an error exit status (143) -- Execution is retried (1)

@kirilenkobm
Copy link
Member

Got it, some chromosomes containing many repeats may take much longer.
What you can try is to reduce the chunk size for reference or the query, like 10 times smaller.
However, a better and long term solution is to implement some argument to split selected chromosomes into smaller chunks, without touching the rest.
@MichaelHiller what do you think?

@lnyawen
Copy link
Author

lnyawen commented Oct 27, 2023

If I try to reduce the chunk size for reference and query, should I use the parameters --seq1_limit and _--seq2_limit_? And what is the best size for these two parameters?

One more question, if I split the chromosome into smaller chunks, will it lead to a change in the result of calculating TOGA later. I mean if splitting into smaller chunks will it cause some genes to be identified as Missing if a gene happens to be located where it is split.

@MichaelHiller
Copy link
Collaborator

We typically use 175Mb chunks for the reference and 50 Mb chunks for the query.
Too long lastz jobs are typically the result of unmasked repeats. Sometimes adding windowMasking helps. And reducing chunksize to say 50 vs 10 Mb or smaller may also help.

This step only affects the lastz (all vs all local alignment) step. The downstream chaining step should give the same results. And TOGA uses these chains. So TOGA results should not be affected by chunksize.

@lnyawen
Copy link
Author

lnyawen commented Oct 27, 2023

Thank you for your reply.

You pointed out that the problem might be due to the result of unmasked repeats. this is how I got the result of masking, using Repeatmasker to soft mask the genome based on the library identified by Repeatmodeler.

Then, in order to further mask the genome, I need to use Windowmasker on top of the softmasked genome just mentioned, right?

My idea is to use Windowmasker first to further mask the genome, and if that doesn't work, I then reduce the chunksize. What do you think? Or just reduce chunksize

@MichaelHiller
Copy link
Collaborator

Right, Windowmasker on top of the repeatMasker softmasked genome. You may only add WM for the scaffolds that cause problems. They likely have well assembled centromers, which are not correctly masked by RM.

The quickest would be keeping the genome as is and reducing chunkSize. Simultaenously, you can run WM and use the additional masking if a smaller chunksize doesn't do it.

@lnyawen
Copy link
Author

lnyawen commented Oct 27, 2023

Ok, thank you for your help!

@lnyawen lnyawen closed this as completed Oct 27, 2023
@kirilenkobm kirilenkobm added documentation Improvements or additions to documentation performance Everything related to the pipeline performance labels Oct 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation nextflow issues Nextflow issues performance Everything related to the pipeline performance
Projects
None yet
Development

No branches or pull requests

3 participants