Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manager doesn't detect when pipeline errors crashed snakemake #14

Open
rabdill opened this issue Jan 17, 2023 · 0 comments
Open

Manager doesn't detect when pipeline errors crashed snakemake #14

rabdill opened this issue Jan 17, 2023 · 0 comments

Comments

@rabdill
Copy link
Contributor

rabdill commented Jan 17, 2023

If a job in the snakemake pipeline fails, the manager correctly identifies that something didn't complete. BUT, if there's an error from snakemake itself, the running.txt file never gets deleted, so the manager thinks it's running indefinitely. Example from project PRJNA530790:

[Fri Jan 13 19:11:12 2023]
Finished job 396.
269 of 446 steps (60%) done
Select jobs to execute...

[Fri Jan 13 19:11:12 2023]
rule sra_to_fastq:
    input: SRR8849058/SRR8849058.sra
    output: fastq/SRR8849058.fastq
    jobid: 433
    reason: Missing output files: fastq/SRR8849058.fastq; Input files updated by another job: SRR8849058/SRR8849058.sra
    wildcards: sample=SRR8849058
    threads: 4
    resources: mem_mb=2000, mem_mib=1908, disk_mb=1000, disk_mib=954, tmpdir=<TBD>, slurm_account=blekhman, slurm_partition=blekhman, runtime=480

WorkflowError:
SLURM job submission failed. The error message was sbatch: error: Batch job submission failed: Socket timed out on send/recv operation
@rabdill rabdill added this to the 0.2 In-progress improvements milestone Jan 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant