-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The number of tasks submitted by SLURM exceeded the limit #64
Comments
100 jobs per user is very very restrictive. I typically submit a few thousand jobs. To get the number down to less than 100, you likely have to increase the chunksize AND the seq limit parameter (no of scaffolds that can be bundled in one job) further. Hope that helps |
Pardon me for hitchhiking.
@MichaelHiller In the legacy version (v1.0.0), there was a parameter EXECUTOR_QUEUESIZE, which I believe could limit the number of jobs submitted at once:
I realize that v.2.0.8 does not have this parameter. Was there a reason to remove this parameter? It would be convenient to have a parameter to limit the number of jobs submitted at once. We could create several thousands of jobs and let it run 100 at a time. It will take time for sure, but we won't need to worry about the number of jobs, etc. |
One thing we could try is to add a generic NextFlow Perhaps the parameter can be added to |
Hello, @ohdongha
However, I got the same error: |
Sorry, I am not so familiar with NextFlow, but the queueSize parameter could be a good idea. |
I attempted to running the older version (v1.0.0) to solve the problem with
|
@aaannaw For this, one workaround that worked for me was to include this (as a global parameter on the node you run
Note: I am not sure if this worked for me because I installed an older version of Note2: maybe it will work as long as the node can download |
@ohdongha
Now I run the pipeline with the command:
It displays "[SLURM] queue (pNormal) status cannot be fetched" but the pNormal partition is corrected:
The log file is attached. Could you give me any suggestions? |
This is likely an issue with your cluster. Can you test submitting any other jobs via Nextflow? |
@MichaelHiller
|
@aaannaw If I also plan to try running v.2.0.8 on multiple computing nodes on our HPC system, submitting jobs from the login (head) node that has permission to do so. I will see how it goes. |
@ohdongha However, it seems that the run did not work with parallel way by using all CPUs. After running 39 hours, only 14% of process is finished, as shown in the make_chains.log. |
For the parallel run, you may need to check the wall time and CPU time if your system reports them after the job is done. In my case, a recent alignment of human vs. Chinese hamster, for example, took 21.7 hours in wall time and 506.0 hours in CPU time, which means (506/21.7=) 23.3 CPU cores have been used on average. I asked for a node with 32 CPUs for this run. I guess the ratio was not closer to 32 because, after the first You may want to check this ratio first, perhaps using a smaller genome pair that creates fewer If the run is slow, you may also want to check if the two genomes have been masked enough. Michael always emphasizes to use |
@ohdongha |
@ohdongha |
Soft-masked fasta files should be fine (and perhaps needed for the I checked the UCSC mm10 (fasta), and it has ~43.9% of all nucleotides soft-masked. That is close to what I have previously used for mouse GRCm38 (~44.5% masked by It is hard to know if the slow progress is due to repeats or the SLURM node not firing up all gears. I guess some tests, e.g., aligning a smaller genome pair, may help. |
Hello, professor
I was running the pipeline to align my genome assemblies with mm10 genome via slurm:
./make_chains.py target query mm10.fasta Bsu.softmask.fasta --pd mm-Bsu -f --chaining_memory 30 --cluster_queue pNormal --executor slurm --nextflow_executable /data/01/user157/software/bin/nextflow
and I encounter an error after running the command for several minnites:The error is because our server limits the maximum number of submitted tasks per person to 100 and I find the default chunk size will generated 1955 jobs, which is well over 100 limited jobs.
Thus, I attempted to increase the chunk size like this:
./make_chains.py target query mm10.fasta Bsu.softmask.fasta --pd mm-Bsu -f --chaining_memory 30 --cluster_queue pNormal --executor slurm --nextflow_executable /data/01/user157/software/bin/nextflow --seq1_chunk 500000000 --seq2_chunk 500000000
. However, this still generated 270 jobs as following.This is unbelievable. I checked and found that when the number of scaffolds is too much, up to 100 scaffolds are put in a chunk for comparison, even though they don't add up to the chunk size. I don't know what's going on here.
Anyway, I think there should exist the method, without increasing the chunk size (as I understand that increasing the chunk size would increases the runtime), that allow me to submit multiple lines command per task, which would guarantee that I would complete 1955 commands with less than 100 tasks submitted!
Looking forward with your suggestions!
Best wishes!
Na Wan
The text was updated successfully, but these errors were encountered: