Does 2l.pmm bias correlations downwards? #464
Replies: 4 comments 2 replies
-
Hi @skramer1958, would it be possible to post a reproducible example for this issue? See e.g. https://reprex.tidyverse.org/articles/articles/learn-reprex.html. FYI, another useful resource is the mice vignette about imputing multilevel data: https://www.gerkovink.com/miceVignettes/Multi_level/Multi_level_data.html. |
Beta Was this translation helpful? Give feedback.
-
Hanne,
This may be a foolish question, but the webinar https://reprex.tidyverse.org/articles/articles/learn-reprex.html is using something called R Studio, which is not the same as R. Do I need to do tutorials on R studio and download and start using that before I can begin making a reproducible example?
Steve
From: Hanne Oberman ***@***.***>
Sent: Monday, January 24, 2022 12:52 PM
To: amices/mice ***@***.***>
Cc: Steven Kramer ***@***.***>; Mention ***@***.***>
Subject: Re: [amices/mice] Does 2l.pmm bias correlations downwards? (Issue #462)
Hi @skramer1958<https://github.com/skramer1958>, would it be possible to post a reproducible example for this issue? See e.g. https://reprex.tidyverse.org/articles/articles/learn-reprex.html. FYI, another useful resource is the mice vignette about imputing multilevel data: https://www.gerkovink.com/miceVignettes/Multi_level/Multi_level_data.html.
—
Reply to this email directly, view it on GitHub<#462 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AXOICQ5WXGKP5NNJ7ZNVUPLUXWGUNANCNFSM5MV3HEOQ>.
Triage notifications on the go with GitHub Mobile for iOS<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID: ***@***.******@***.***>>
|
Beta Was this translation helpful? Give feedback.
-
Hi Steve, |
Beta Was this translation helpful? Give feedback.
-
Thanks for your question. Some thoughts:
|
Beta Was this translation helpful? Give feedback.
-
I'm trying to impute data on a multilevel data set (students in classrooms in schools). I'm using 2l.pmm to account for school-level correlations, since school was the unit of assignment in my experimental study.
However, I notice that the 2l.pmm method tends to impute variables with much lower correlations than were in the original data set. The problem gets worse the more variables I include in the multiple imputation.
For example, students took three science unit tests over two years, two in sixth grade (units called DL and WW) and one in seventh grade (unit called EH). There is lots of missing data for each test. Here are the original correlations for a subset of the data (control group in cohort 1 in one particular state):
Correlation between test DL and test WW : 0.555
Correlation between DL and EH: 0.534
Correlation between WW and EH: 0.520
Using a large data set and a bunch of other relevant variables many of which had missing data (race, gender, disadvantaged status, school mean on minority and disadvantaged, fourth and fifth grade math and reading scores, classroom averages on most of these variables), I imputed a data set using pmm. The first imputation gives a good idea of the results. Here were the correlations on the first imputation:
Correlation between test DL and test WW : 0.547
Correlation between DL and EH: 0.500
Correlation between WW and EH: 0.542
But when I used 2l.pmm to compute these three and other variables with missing data (school as cluster variable) the correlations were much lower.
Correlation between test DL and test WW : 0.421
Correlation between DL and EH: 0.331
Correlation between WW and EH: 0.250
Now it is reasonable that correlations might be a bit attenuated in the full data set, if students with missing data for example tend to be lower scorers who have lower inter-correlations. But the results above don't seem to meet the "sniff test". The attenuation is too much. Plus, these are results I obtained after dropping a bunch of variables from the imputation model, because the more variables I add the lower the correlations get.
Note that these results are for a "subset of the data", but I get the same thing on all of the subsets. If I don't subset and instead include interactions in the model, the correlations get even lower for 2l.pmm. Meanwhile, the pmm approach continues to reproduce correlations very near those in the original data set.
How can I know if 2l.pmm is worth using, i.e., diagnose whether the imputed data sets are reasonable; and whether the 2-level imputation is worse (more biased) than single-level imputation or perhaps even worse (more biased) than listwise deleting missing data?
Beta Was this translation helpful? Give feedback.
All reactions