-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multivariate PMM #429
Comments
I think this is useful. Not exactly sure how you do the matching yet, but @Mingyang-Cai has developed methodology to do multivariate imputation by means of canonical regression analysis. Seems like a solution for your motivating example, too |
My preliminary solution to matching the mean vectors has been a k-nearest neighbour approach via the RANN package. Little (1988) suggests scaling the predicted means by their standard deviation, which I have chosen as the default but can be deactivated via One aspect I am currently struggling with is how to exhaustively evaluate my implementation to make sure it returns sensible results. If someone has suggestions on how to do this, I would be all ears! Very interested also to see the canonical regression approach and compare the results. |
See #460 |
Closing because there is now |
Background
I am working a lot with routinely collected hospital data. Among other things, this type of data contains laboratory measurements that are often measured as panels (i.e., they are present or absent together). A good example of this are full blood counts (platelets, white blood cells, red blood cells, haemoglobin, ....). If a full blood count was performed, these parameters are usually all measured. If no blood count was performed, none of those values are available.
Problem statement
If I want to impute full blood count using predictive mean matching (PMM), I currently need to do so univariately. This works in principle but needs some tweaking of the
predictorMatrix
, as many of its components are strongly correlated, which can lead to non-convergence. Furthermore, imputing values univariately may fail to preserve any (hypothetical) joint distribution of those values.Potential solution
In chapter 4.7.2. of van Burren (2018), @stefvanbuuren suggests a multivariate generalisation of the PMM algorithm that may be used within blocks. This method isn't currently implemented in
mice
. As part of a project, I have implemented a prototype of multivariate PMM following the guidance in Little (1988).Questions
mice
?mice
in general) be further improved? For example, it currently only works with formulas (due to a similar reason that causes an Error in mitml::jomoImpute: Target variables do not contain any missing data. #379 )References
Van Buuren, S. (2018). Flexible Imputation of Missing Data. Second Edition. Chapman & Hall/CRC. Boca Raton, FL.
Little, R. J. A. 1988. “Missing-Data Adjustments in Large Surveys (with Discussion).” Journal of Business Economics and Statistics 6 (3): 287–301.
The text was updated successfully, but these errors were encountered: