Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Silent OOM bug with large Hi-C Datasets #186

Open
ignacio3437 opened this issue Dec 11, 2024 · 1 comment
Open

Silent OOM bug with large Hi-C Datasets #186

ignacio3437 opened this issue Dec 11, 2024 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@ignacio3437
Copy link

Description of the bug

The FQ2HIC-RUNASSEMBLYVIZ module failes with a java OOM error and the pipeline continues. The .hic file that is produced is not usable and too small (200kb). The hi-c fastq reads were 8.5GB each.

Command used and terminal output

MMC commands here:
https://pfr-powerplant.s3.ap-southeast-2.amazonaws.com/output/genomic/plant/Actinidia/chinensis/KBC-pangenome/Red9/curated-scaffolds/assemblyqc_params/

Relevant files

####FQ2HIC-RUNASSEMBLYVIZ stdout.autosave

Picked up _JAVA_OPTIONS: -Djava.util.prefs.userRoot=user_prefs -Duser.home=user_home -Xms4g -Xmx4g
Dec 11, 2024 7:14:46 AM java.util.prefs.FileSystemPreferences$1 run
INFO: Created user preferences directory.
Exception in thread "main"
java.lang.OutOfMemoryError: Java heap space

  at java.base/java.util.Arrays.copyOf(Arrays.java:3689)
  at java.base/java.util.ArrayList.grow(ArrayList.java:238)
  at java.base/java.util.ArrayList.grow(ArrayList.java:243)
  at java.base/java.util.ArrayList.add(ArrayList.java:486)
  at java.base/java.util.ArrayList.add(ArrayList.java:499)
  at com.google.common.base.Splitter.splitToList(Splitter.java:422)
  at juicebox.tools.utils.original.AsciiPairIterator.advance(AsciiPairIterator.java:93)

  at juicebox.tools.utils.original.AsciiPairIterator.next(AsciiPairIterator.java:194)
  at juicebox.tools.utils.original.Preprocessor.writeBody(Preprocessor.java:387)
  at juicebox.tools.utils.original.Preprocessor.preprocess(Preprocessor.java:283)

  at juicebox.tools.clt.old.PreProcessing.run(PreProcessing.java:108)
  at juicebox.tools.HiCTools.main(HiCTools.java:86)

System information

This was run on AWS using MMC.
Outputs here:
https://ap-southeast-2.console.aws.amazon.com/s3/buckets/pfr-powerplant?prefix=output/genomic/plant/Actinidia/chinensis/KBC-pangenome/Red9/assemblyqc/&region=ap-southeast-2&bucketType=general

@ignacio3437 ignacio3437 added the bug Something isn't working label Dec 11, 2024
@GallVp
Copy link
Member

GallVp commented Dec 11, 2024

Hi @ignacio3437

Thank you for the bug report. This is not an OOM issue. Rather, it is a JAVA out of heap memory issue. By default, RUNASSEMBLYVISUALIZER is labelled process_single which means it has 6.GB of memory. Java heap memory is calculated as 80% of the task memory (6.GB) here,

if ( !task.memory ) { error '[RUNASSEMBLYVISUALIZER] Available memory not known. Specify process memory requirements to fix this.' }
def avail_mem = (task.memory.giga*0.8).intValue()
"""
assembly_tag=\$(echo $sample_id_on_tag | sed 's/.*\\.on\\.//g')
file_name="${agp_assembly_file}"
mkdir user_home
export _JAVA_OPTIONS="-Djava.util.prefs.userRoot=user_prefs -Duser.home=user_home -Xms${avail_mem}g -Xmx${avail_mem}g"

Essentially, we need to bump the task memory for RUNASSEMBLYVISUALIZER. Perhaps, we should allocate memory based on the size of the sorted_links_txt_file file,

tuple val(sample_id_on_tag), path(agp_assembly_file), path(sorted_links_txt_file)

This approach is also being experimented elsewhere: https://github.com/nf-core/modules/pull/6628/files

For now, the easiest solution is to add the following lines in your custom config (mmc.config) for large datasets,

withName: RUNASSEMBLYVISUALIZER {
    memory = { 16.GB  * task.attempt }
}

Thus, the problem can be resolved without changing the pipeline codebase.

@GallVp GallVp self-assigned this Dec 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants