futuremice
scales poorly on 64/128 cores machines
#570
Replies: 12 comments 4 replies
-
This sounds more like a resource problem than a bug. |
Beta Was this translation helpful? Give feedback.
-
@stefvanbuuren How is that so? |
Beta Was this translation helpful? Give feedback.
-
It would be useful if we can have a reprex somehow, otherwise it is very hard for us to chase. Did you try setting m = 200 with a small problem? |
Beta Was this translation helpful? Give feedback.
-
I can give it a go, but that might take some time. |
Beta Was this translation helpful? Give feedback.
-
@stefvanbuuren I tried to come up with an example that closely matches my data, but that's cumbersome. Instead, I used the code from the library(mice)
version()
set.seed(123)
n_features = 20
small_covmat <- diag(n_features)
small_covmat[small_covmat == 0] <- 0.5
small_data <- MASS::mvrnorm(10000,
mu = c(1:n_features) * 0,
Sigma = small_covmat)
small_data_with_missings <- ampute(small_data, prop = 0.8, mech = "MCAR")$amp
n_streams <- 5
start_time <- Sys.time()
imp <- futuremice(small_data_with_missings,
parallelseed = 123,
n.core = n_streams,
m = n_streams,
maxit = 1,
method = "rf",
ntrees=10)
end_time <- Sys.time()
end_time - start_time $ Rscript main.R
Attaching package: ‘mice’
The following object is masked from ‘package:stats’:
filter
The following objects are masked from ‘package:base’:
cbind, rbind
[1] "mice 3.16.0 2023-05-24 /home/software/.local/easybuild/software/R/4.2.0-foss-2021b/lib/R/library"
Time difference of 24.37422 secs This is what I get when $ Rscript main.R
Attaching package: ‘mice’
The following object is masked from ‘package:stats’:
filter
The following objects are masked from ‘package:base’:
cbind, rbind
[1] "mice 3.16.0 2023-05-24 /home/software/.local/easybuild/software/R/4.2.0-foss-2021b/lib/R/library"
Time difference of 2.550763 mins and when $ Rscript main.R
Attaching package: ‘mice’
The following object is masked from ‘package:stats’:
filter
The following objects are masked from ‘package:base’:
cbind, rbind
[1] "mice 3.16.0 2023-05-24 /home/software/.local/easybuild/software/R/4.2.0-foss-2021b/lib/R/library"
Time difference of 2.003762 mins Real-life numbers are much-much worse since I work mostly with categories, and their number is significantly higher. As this number goes up, literally every process starts spawning threads like there is no tomorrow. I understand that there is some overhead, but that's a bit too much. This automatically results in 100% load even when Just to show what I have to deal with: the same code with the actual data and |
Beta Was this translation helpful? Give feedback.
-
@stefvanbuuren Got it done, at least something:
and with
I'd blame Intel MKL or something futureverse/future#405 , but those are AMD machines. |
Beta Was this translation helpful? Give feedback.
-
Perhaps the problem is not with What happens if you specify |
Beta Was this translation helpful? Give feedback.
-
The number of categories is high, that's true. However, this should not affect parallel and independent imputations. Anyway, real data with Error in (function (.x, .f, ..., .progress = FALSE) :
ℹ In index: 1.
Caused by error in `chol.default()`:
! the leading minor of order 1 is not positive
Calls: futuremice ... resolve.list -> signalConditionsASAP -> signalConditions
Execution halted The |
Beta Was this translation helpful? Give feedback.
-
Thanks. On my desktop with 9 free cores, I found your reprex executes uses 15 seconds (n = 5), 14 seconds (n = 9), 21 seconds (n = 18), 47 seconds (n = 50) and 1.37 minutes (n = 100). I think this is as it should be. I am not sure what causes Random forests ( |
Beta Was this translation helpful? Give feedback.
-
It could also help to simplify your model, e.g., by using |
Beta Was this translation helpful? Give feedback.
-
Converting to discussion because I do not believe this is caused by a bug. |
Beta Was this translation helpful? Give feedback.
-
Thanks. I ran your simulation script on my desktop, and found all behaved well. The results were similar to your cluster simulations, apart from being about 2 times faster.
I have no idea why your laptop simulations show problematic behaviour. As the timing depends on the configuration, I do not think there is something in |
Beta Was this translation helpful? Give feedback.
-
Describe the bug
futuremice
runs just fine when I do 10 or 20 imputations at the same time. When I increase the number to say 50 or 100 while keeping all other parameters the same, it just sits there indefinitely.To Reproduce
Not sure that any code would be suitable here.
Expected behavior
Running 10, 20, 50, 100 imputations at once should take roughly the same amount of time.
Beta Was this translation helpful? Give feedback.
All reactions