Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multivariate PMM #429

Closed
prockenschaub opened this issue Sep 15, 2021 · 4 comments
Closed

Multivariate PMM #429

prockenschaub opened this issue Sep 15, 2021 · 4 comments

Comments

@prockenschaub
Copy link
Contributor

Background

I am working a lot with routinely collected hospital data. Among other things, this type of data contains laboratory measurements that are often measured as panels (i.e., they are present or absent together). A good example of this are full blood counts (platelets, white blood cells, red blood cells, haemoglobin, ....). If a full blood count was performed, these parameters are usually all measured. If no blood count was performed, none of those values are available.

Problem statement

If I want to impute full blood count using predictive mean matching (PMM), I currently need to do so univariately. This works in principle but needs some tweaking of the predictorMatrix, as many of its components are strongly correlated, which can lead to non-convergence. Furthermore, imputing values univariately may fail to preserve any (hypothetical) joint distribution of those values.

Potential solution

In chapter 4.7.2. of van Burren (2018), @stefvanbuuren suggests a multivariate generalisation of the PMM algorithm that may be used within blocks. This method isn't currently implemented in mice. As part of a project, I have implemented a prototype of multivariate PMM following the guidance in Little (1988).

Questions

  1. Is there an appetite to make this algorithm available within mice?
  2. If yes, does the approach taken by me seem sensible? Could the design of the function (or the handling of blocks in mice in general) be further improved? For example, it currently only works with formulas (due to a similar reason that causes an Error in mitml::jomoImpute: Target variables do not contain any missing data. #379 )

References

Van Buuren, S. (2018). Flexible Imputation of Missing Data. Second Edition. Chapman & Hall/CRC. Boca Raton, FL.

Little, R. J. A. 1988. “Missing-Data Adjustments in Large Surveys (with Discussion).” Journal of Business Economics and Statistics 6 (3): 287–301.

@gerkovink
Copy link
Member

gerkovink commented Sep 15, 2021

I think this is useful. Not exactly sure how you do the matching yet, but @Mingyang-Cai has developed methodology to do multivariate imputation by means of canonical regression analysis. Seems like a solution for your motivating example, too

@prockenschaub
Copy link
Contributor Author

prockenschaub commented Sep 15, 2021

My preliminary solution to matching the mean vectors has been a k-nearest neighbour approach via the RANN package. Little (1988) suggests scaling the predicted means by their standard deviation, which I have chosen as the default but can be deactivated via scale=FALSE.

One aspect I am currently struggling with is how to exhaustively evaluate my implementation to make sure it returns sensible results. If someone has suggestions on how to do this, I would be all ears!

Very interested also to see the canonical regression approach and compare the results.

@gerkovink
Copy link
Member

See #460

@stefvanbuuren
Copy link
Member

Closing because there is now mice.impute.mpmm(). Feel free to reopen for other ideas on implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants