Multivariate PMM #429

prockenschaub · 2021-09-15T14:22:42Z

Background

I am working a lot with routinely collected hospital data. Among other things, this type of data contains laboratory measurements that are often measured as panels (i.e., they are present or absent together). A good example of this are full blood counts (platelets, white blood cells, red blood cells, haemoglobin, ....). If a full blood count was performed, these parameters are usually all measured. If no blood count was performed, none of those values are available.

Problem statement

If I want to impute full blood count using predictive mean matching (PMM), I currently need to do so univariately. This works in principle but needs some tweaking of the predictorMatrix, as many of its components are strongly correlated, which can lead to non-convergence. Furthermore, imputing values univariately may fail to preserve any (hypothetical) joint distribution of those values.

Potential solution

In chapter 4.7.2. of van Burren (2018), @stefvanbuuren suggests a multivariate generalisation of the PMM algorithm that may be used within blocks. This method isn't currently implemented in mice. As part of a project, I have implemented a prototype of multivariate PMM following the guidance in Little (1988).

Questions

Is there an appetite to make this algorithm available within mice?
If yes, does the approach taken by me seem sensible? Could the design of the function (or the handling of blocks in mice in general) be further improved? For example, it currently only works with formulas (due to a similar reason that causes an Error in mitml::jomoImpute: Target variables do not contain any missing data. #379 )

References

Van Buuren, S. (2018). Flexible Imputation of Missing Data. Second Edition. Chapman & Hall/CRC. Boca Raton, FL.

Little, R. J. A. 1988. “Missing-Data Adjustments in Large Surveys (with Discussion).” Journal of Business Economics and Statistics 6 (3): 287–301.

The text was updated successfully, but these errors were encountered:

gerkovink · 2021-09-15T15:16:46Z

I think this is useful. Not exactly sure how you do the matching yet, but @Mingyang-Cai has developed methodology to do multivariate imputation by means of canonical regression analysis. Seems like a solution for your motivating example, too

prockenschaub · 2021-09-15T15:56:10Z

My preliminary solution to matching the mean vectors has been a k-nearest neighbour approach via the RANN package. Little (1988) suggests scaling the predicted means by their standard deviation, which I have chosen as the default but can be deactivated via scale=FALSE.

One aspect I am currently struggling with is how to exhaustively evaluate my implementation to make sure it returns sensible results. If someone has suggestions on how to do this, I would be all ears!

Very interested also to see the canonical regression approach and compare the results.

gerkovink · 2022-02-05T18:16:18Z

See #460

stefvanbuuren · 2022-03-31T21:18:42Z

Closing because there is now mice.impute.mpmm(). Feel free to reopen for other ideas on implementation.

stefvanbuuren closed this as completed Mar 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multivariate PMM #429

Multivariate PMM #429

prockenschaub commented Sep 15, 2021

gerkovink commented Sep 15, 2021 •

edited

Loading

prockenschaub commented Sep 15, 2021 •

edited

Loading

gerkovink commented Feb 5, 2022

stefvanbuuren commented Mar 31, 2022

Multivariate PMM #429

Multivariate PMM #429

Comments

prockenschaub commented Sep 15, 2021

Background

Problem statement

Potential solution

Questions

References

gerkovink commented Sep 15, 2021 • edited Loading

prockenschaub commented Sep 15, 2021 • edited Loading

gerkovink commented Feb 5, 2022

stefvanbuuren commented Mar 31, 2022

gerkovink commented Sep 15, 2021 •

edited

Loading

prockenschaub commented Sep 15, 2021 •

edited

Loading