Grouping data before pooled analysis #465

riesmeijersa · 2022-02-03T14:41:38Z

riesmeijersa
Feb 3, 2022

Hi,

I have imputed missing data in a multilevel dataset, containing multiple fingers (cases) per patient (group), using mice.adds 2l.pmm function. Now I want to perform analysis and pooling on patient-level. In chapter 5.1 of the e-book 'https://stefvanbuuren.name/fimd/workflow.html' a recommended workflow for analysis of imputed data is given:

      # The long format can be processed by the dplyr::do() function into a list-column and pooled, as follows:
      # long workflow using a dplyr list-column
      library(dplyr)
      est7 <- nhanes %>%
        mice(seed = 123, print = FALSE) %>%
        mice::complete("long") %>%
        group_by(.imp) %>%
        do(model = lm(formula = chl ~ age + bmi + hyp, data = .)) %>%
        as.list() %>%
        .[[-1]] %>%
        pool()

I would like to group my dataset also on case-level, something like this:

        library(dplyr)
          impdat_rec30 <- imp6.1 %>%
          mice::complete("long")  %>%
          group_by(.imp,case_number) %>%
          do(model = glm(formula = recurrence_total_30 ~ prs_complete, family = binomial, data = .)) %>%
          as.list() %>%
          .[[-1]] %>%
          pool()

However, I get the error:
Error in .[[-1]] : invalid negative subscript in get1index

How do I incorporate using a second level in this code?

Many thanks,
Sophie

stefvanbuuren · 2022-02-03T15:49:22Z

stefvanbuuren
Feb 3, 2022
Maintainer

I have never applied a double group variable.

Do you really want to apply the model to each case separately?
Perhaps stop after as.list() and see what you need to drop before doing pool()

0 replies

thomvolker · 2022-02-03T16:24:33Z

thomvolker
Feb 3, 2022

It is a bit ambiguous whether you would like to run your model on each finger of all patients (say, 25 times a thumb, 25 times an index finger, and so on), or on all fingers of each patient.

Another complicating factor here is that pool() works for pooling analyses over imputations, but would not yield correct inferences if you want to pool the separate analyses over the fingers/patients. I have added an example of how you could go about this below, but I am not sure whether this is exactly what you want.

library(mice)
#> 
#> Attaching package: 'mice'
#> The following object is masked from 'package:stats':
#> 
#>     filter
#> The following objects are masked from 'package:base':
#> 
#>     cbind, rbind
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(purrr)


set.seed(123)

# Sample each observation in the nhanes dataset 10 times, to reflect 10 timepoints
dat <- map_dfr(1:10, 
               ~nhanes[sample(1:nrow(nhanes)), ], 
               .id = "Time") %>%
  mutate(ID = rep(1:{nrow(.)/10}, 10))

dat %>%
  mice(m = 10, print = F) %>% # impute the data (ignoring the levels for simplicity)
  complete("long") %>%        # create 10 complete datasets containing all 10 timepoints
  group_by(.imp, ID) %>%      # group by observation id and imputation
  summarise(model = list(lm(bmi ~ age + hyp, cur_data())), # fit one model per person, per
            .groups = "drop") %>%       # imputation (timepoints as observations within models)
  group_by(ID) %>%                      # now group by ID, to pool the models over all
  summarise(model = list(pool(model)))  # imputations, resulting in a single model per observation.
#> Warning: Number of logged events: 1
#> # A tibble: 25 × 2
#>       ID model         
#>    <int> <list>        
#>  1     1 <mipo [0 × 4]>
#>  2     2 <mipo [0 × 4]>
#>  3     3 <mipo [0 × 4]>
#>  4     4 <mipo [0 × 4]>
#>  5     5 <mipo [0 × 4]>
#>  6     6 <mipo [0 × 4]>
#>  7     7 <mipo [0 × 4]>
#>  8     8 <mipo [0 × 4]>
#>  9     9 <mipo [0 × 4]>
#> 10    10 <mipo [0 × 4]>
#> # … with 15 more rows

# Which yields 25 models (one for each patient, regarding the timepoints as independent 
# observations, and pooled over the imputations).

^{Created on 2022-02-03 by the reprex package (v2.0.0)}

1 reply

riesmeijersa Feb 9, 2022
Author

Thank you for your helpful answers :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Grouping data before pooled analysis #465

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Grouping data before pooled analysis #465

riesmeijersa Feb 3, 2022

Replies: 2 comments · 1 reply

stefvanbuuren Feb 3, 2022 Maintainer

thomvolker Feb 3, 2022

riesmeijersa Feb 9, 2022 Author

riesmeijersa
Feb 3, 2022

Replies: 2 comments 1 reply

stefvanbuuren
Feb 3, 2022
Maintainer

thomvolker
Feb 3, 2022

riesmeijersa Feb 9, 2022
Author