Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

contribution calculation step timing out on cluster.... #218

Open
sid5427 opened this issue Dec 12, 2024 · 2 comments
Open

contribution calculation step timing out on cluster.... #218

sid5427 opened this issue Dec 12, 2024 · 2 comments

Comments

@sid5427
Copy link
Collaborator

sid5427 commented Dec 12, 2024

Hi Anushri and other authors.

I am running chrombpnet on a large multiome dataset. For this we have clusterwise bams for which we generated macs2 narrowpeaks and then generated a consensus peak set of 357200 peaks with intersectbed. We were then able to run chrombpnet successfully till the main modeling step. Bias model was run using the largest cluster's bam.

However for many of the clusters, the contribution step times out - even when running on A100 gpus with a 7 day run time limit. I am not seeing any obvious pattern.... some large clusters finish while some do not, along with some really small ones never finishing while a few similarly sized ones small ones easily finish. Is there any other way to speed this step?

@panushri25
Copy link
Collaborator

Ah thats strange, did you verify its using the GPU whenever it fails?

Can you post the screenshot of your error here?

@sid5427
Copy link
Collaborator Author

sid5427 commented Dec 18, 2024

Hi Anushri,

I tried a run using one of our bam files and a smaller set of correlated peaks - 126666 peaks from the original 357200. It only managed to process 56700 peaks from 126666 and the job got killed at the end of the 4 day time limit.

Here are the logs from that run - these log files look similar to the ones we had before

err.log - https://1drv.ms/u/s!Alr23pNlJf37lJkDsmDJDEk7jHieXw?e=wgiEXX

out.log - https://1drv.ms/u/s!Alr23pNlJf37lJkCURy1F-c5rWu7-A?e=nJBb6w

The bam file's size is 23gb

I am not very sure how to confirm if it's using the GPU - it's certainly running on our GPU node

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants