Skip to content

Commit

Permalink
Optimisations for executing on clusters using dask.distributed - docu…
Browse files Browse the repository at this point in the history
…mented remaining warnings.
  • Loading branch information
bramvds committed Apr 28, 2018
1 parent 8614fb1 commit 7f1cd40
Showing 1 changed file with 12 additions and 0 deletions.
12 changes: 12 additions & 0 deletions src/pyscenic/prune.py
Original file line number Diff line number Diff line change
Expand Up @@ -261,6 +261,18 @@ def wrap(data):
# of recovery curves - 20K features (max. enriched) * rank_threshold * 8 bytes (float) * num_cores),
# this might not be a sound idea to do.

# NOTE ON REMAINING WARNINGS:
# >> distributed.worker - WARNING - Memory use is high but worker has no data to store to disk.
# >> Perhaps some other process is leaking memory? Process memory: 1.51 GB -- Worker memory limit: 2.15 GB
# My current idea is that this cannot be avoided processing a single module can sometimes required
# substantial amount of memory because of pre-allocation of recovery curves (see code notes on how to
# mitigate this problem). Setting module_chunksize=1 also limits this problem.
#
# >> distributed.utils_perf - WARNING - full garbage collections took 10% CPU time recently (threshold: 10%)
# The current implementation of module2df removes substantial amounts of memory (i.e. the RCCs) so this might
# again be unavoidable. TBI + See following stackoverflow question:
# https://stackoverflow.com/questions/47776936/why-is-a-computation-much-slower-within-a-dask-distributed-worker

return aggregate_func(
(delayed(transform_func)
(db, gs_chunk, delayed_or_future_annotations)
Expand Down

0 comments on commit 7f1cd40

Please sign in to comment.