Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelization on linux leads to crash #55

Open
wenzmo opened this issue Apr 12, 2024 · 1 comment
Open

Parallelization on linux leads to crash #55

wenzmo opened this issue Apr 12, 2024 · 1 comment

Comments

@wenzmo
Copy link

wenzmo commented Apr 12, 2024

One of my collegues is wokring mainly with two functions. One is the ctmm.guess and other is ctmm.fit.
Because he has a lot of individuals he tried to parallize the whole process like this:

doParallel::registerDoParallel(cl)

model_fit <- function(i)
{
  GUESS <-
    ctmm.guess(data[[i]],
               CTMM = ctmm(error = TRUE),
               interactive = FALSE)
  ctmm.select(data[[i]], GUESS) # this function has a built-in parallelization argument
}

After a certain time the our linux server with 196 threads crashes because of the heavily usage of cores.
We found out that the makeCluster(10) is not limmiting the processes to 10 cores.

The problem seems to be hidden in the ctmm.guess funtion. Inside is the variogramm function which has the argument 'fast=TRUE'.
I found the 'fast=TRUE' in the parallel.R script which seems to be the script managing the parallelization for the whole package.
This argument says if the operating system is a linux system it should use all logical cores (if Windows set number of cores to 1).

There are several problems comming with this:

  1. I learned: never use all your cores by default! (This is maybe out-dated)
  2. Its not mentioned in the vignettes/readme/description. You have to search for it.
  3. It has a huge impact how fast your code is running depending on the operating system.
  4. It is impossible parallelize with the ctmm.guess function due to the possible overflow of the system.

What I suggest is to mention this somewhere in the vignettes/readme/description or add an argument to select on your own how many cores you want to use. But definitely chage this to 'detectCors( ) - 1' (or even more).

@chfleming
Copy link
Contributor

@wenzmo , The fast argument in variogram() is for FFT usage and is unrelated to the fast argument in the parallelization code, which is for choosing low-overhead fork versus high-overhead socket parallelization. I don't believe this argument can be passed from the former functions to the latter functions.

The detectCores() function is to determine the maximum possible number of cores, for limiting the user's choice and for interpreting negative arguments like cores=-1, which means "all cores but 1".

It should be the case that in all ctmm functions, the user must select the number of cores. In parallelized ctmm functions, the default is cores=1, which means no parallelization. The code you have quoted doesn't activate any parallelization in ctmm functions. ctmm.guess is not parallelized, but ctmm.select is somewhat parallelized IIRC, but you have to set the cores argument to something other than 1 to activate that. But, if you have multiple datasets to run, then its better for you the user to parallelize at the level of datasets and not parallelize within ctmm functions (which is default usage, as you have quoted).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants