Replies: 1 comment 1 reply
-
I think there's a couple things that are obviously impacting the numbers of random effects you're estimating -- the mesh dimension and the number of years in the dataset (which is a little larger than many fisheries datasets). It sounds like you were already debugging the mesh dimension (though unclear how large For the smoother, sdmTMB implements all these smooths as p-splines, similar to
Similarly, with the "cc" smooth on day, I'd try to swap in a polynomial instead? I think it's probably worth trying to debug the smooths in mgcv before fitting the full spatial model in sdmTMB |
Beta Was this translation helpful? Give feedback.
-
Me and @maxlindmark are modelling spatiotemporal patterns of cod feeding using a large dataset on stomach content (~80 000 observations across 60 years). We´re having some issues with memory use causing R to crash predicting the model (and sometimes during fitting). The 80K observations turn into 400K as we model stomach content by prey group (5 groups times 80K observations).
When model fit is successful (takes aprox. 10 h on a m3 chip MacBook with 8GB ram), R crashes during predict() independent of:
-Prediting on the original data or using newdata.
-Chopping up the predictions using map() over years and factor predictors (as suggested here https://github.com/pbs--assess/sdmTMB/issues/334#issuecomment-2192550448)
-Predicting only on one year.
-Predicting our delta model components separately
-Reducing no of mesh vertices (larger cutoff)
I can get around this problem by using ~half of the observations or simplifying the model by excluding processes that we´re interested in.
I´ve profiled the memory use during sdmTMB-fitting and whats mainly soaking up memory is an interaction smoother (year ~ cod body size) by a five level factor variable. I understand I am asking alot in terms of computation for this smooth effect but it is really relevant for our aim. I´m using a poisson link delta lognormal error distribution so the two models of course uses more memory but I get the feeling this is not the issue and a tweedie trades off for a poorer fit in my case.
My first question is, do you have any ideas of how to reduce memory use, e.g. at the cost of time that can help deal with the smoother and large data set in sdmTMB?
My second question also concerns the smoother. The two interacting terms of my smooth s(year_sc, p_length_sc, by=prey_group_f) are not naturally on the same scale (anistropic interaction) and what I can gather is that tensor product smooths (via t() or t2()) are prefferered in such cases (https://stat.ethz.ch/R-manual/R-devel/library/mgcv/html/smooth.terms.html). However, predictions with t2() smoothers are not yet supported in sdmTMB (error "There are unresolved issues with predicting on newdata when the formula includes t2() terms. Either predict with
newdata = NULL
or use s(). Post an issue if you'd like us to prioritize fixing this.”). I´m not sure if this need to be prioritized, I´d like to rule out optional solutions that I´m not aware of. Any suggestions?Heres the model im working with:
Mod_smooth_s <-
sdmTMB(
data = df2,
mesh = mesh_long,
formula = val_cube ~ 0 +
prey_group_f +
s(p_length_sc, year_sc, by = prey_group_f) +
s(d_sc, bs = "cc") +
end_sc +
ens_sc +
ent_sc +
eno_sc,
extra_time = missing_years,
spatiotemporal = "iid",
spatial_varying = ~ 0 + prey_group_f,
family = delta_lognormal(type = "poisson-link")
)
The model fit and summary (as html), the corresponsing script (qmd) and some data can be found here:
https://github.com/VThunell/Lammska_cod-fr/tree/main/QsdmTMB_dforum
Thank you,
/Viktor
Beta Was this translation helpful? Give feedback.
All reactions