Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix missing data issue #119

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

fix missing data issue #119

wants to merge 2 commits into from

Conversation

mattansb
Copy link
Contributor

@mattansb mattansb commented Mar 26, 2023

This is a WIP

@mattansb
Copy link
Contributor Author

mattansb commented Mar 27, 2023

Hi @rvlenth,
Thanks for your help on the issue of dealing with missing data. I have taken your advice and solved this by allowing the user to pass a data= argument with non-missing data, which I deal with in recover_data.lavaan().

For ref_grid() and emmeans() this seems to work fine.

However, for emtrends() I am getting estimation problems. Using some debugging, I've found that when emtrends() is called, it called recover_data() twice, but only passes the user's data= argument the first time. I'm assuming this in not intentional?

Thanks!

# remotes::install_github("mattansb/semTools") # install this PR
library(semTools)
library(emmeans)

data("mtcars")
raw_mtcars <- mtcars
mtcars$hp[1] <- NA

model <- " mpg ~ hp + drat + hp:drat "

fit <- sem(model, mtcars, missing = "fiml.x")



(rg <- ref_grid(fit, 
               lavaan.DV = "mpg",
               data = raw_mtcars))
#> 'emmGrid' object with variables:
#>     hp = 146.69
#>     drat = 3.5966

rg@linfct
#>   (Intercept)       hp     drat  hp:drat
#> 1           1 146.6875 3.596563 527.5708






(emM <- emmeans(fit, ~ drat, var = "hp",
                lavaan.DV = "mpg",
                data = raw_mtcars))
#>  drat emmean    SE  df asymp.LCL asymp.UCL
#>   3.6     20 0.614 Inf      18.8      21.2
#> 
#> Confidence level used: 0.95

emM@linfct
#>      (Intercept)       hp     drat  hp:drat
#> [1,]           1 146.6875 3.596563 527.5708






(emT <- emtrends(fit, ~ drat, var = "hp",
                 lavaan.DV = "mpg",
                 data = raw_mtcars))
#>  drat hp.trend SE df asymp.LCL asymp.UCL
#>   3.6   nonEst NA NA        NA        NA
#> 
#> Confidence level used: 0.95

emT@linfct
#>      (Intercept) hp drat hp:drat
#> [1,]           0 NA    0      NA

@rvlenth
Copy link

rvlenth commented Mar 27, 2023

I'm not at all sure that it isn't intentional. The first call to ref_grid() includes a hook to return the data, so that we can set up the difference quotients. The second time we call it, we put another hook that bypasses some stuff already done in the first call. I'll have to look at it to see if we need the data the second time.

@rvlenth
Copy link

rvlenth commented Mar 27, 2023

I think it is right the way it is. The setup for the first call to ref_grid() includes this code:

    rgargs = list(object = object, ...)
   . . .
    data = do.call("ref_grid", c(rgargs))

So if data is included in the ... in the emtrends() call, it gets passed to ref_grid(). As you can see, the purpose of that first call is to retrieve the data (via a special hook included in rgargs).

The second call to ref_grid() is

bigRG = do.call("ref_grid", c(rgargs, data = data))

where data is the data already retrieved in the first call.

So actually I'm confused by your statement that data is passed the first time and not the second, because what we actually have is data being explicitly passed the second time, and only implicitly passed the first time.

@rvlenth
Copy link

rvlenth commented Mar 27, 2023

OK, my bad! It turns out that if rgargs is a list and data is a data frame with variables x and y, then c(rgargs, data = data) is a list with additional elements data.x and data.y. So I put in an additional line of code to add data itself to the list, and confirmed in debug mode that the right stuff is being passed.. You can install from GitHub and see if it works right now.

@mattansb
Copy link
Contributor Author

Hey, this almost fixes the issue.
I now get a new error:

(emT <- emtrends(fit, ~ drat, var = "hp",
                 lavaan.DV = "mpg",
                 data = raw_mtcars))
#> Error in lav_data_full(data = data, group = group, cluster = cluster,  : 
#>   lavaan ERROR: some (observed) variables specified in the model are not found in the dataset: mpg

This is because the data being passed to recover_data() the second time only has the data for the predictors (from the first pass of recover_data()), but lavaan needs the full multivariate/multivariable dataset.

Can we not simply pass the original data= argument the second time as well?

@rvlenth
Copy link

rvlenth commented Mar 28, 2023

You can use the addl.vars argument, e.g., addl.vars = "mpg"

@rvlenth
Copy link

rvlenth commented Mar 28, 2023

By the way, in your emmeans support code for lavaan, since you need the response variable, I recommend you retrieve its name from the ressponse part of the model formula, and include that as addl.vars in the call to recover_data(). Then you won't have to rely on the user providing that in their call. See the help page for emmeans::recover_data.

@patc3
Copy link

patc3 commented Jun 19, 2023

Any update on this issue? Has this been added to simsem?

@rvlenth
Copy link

rvlenth commented Jun 19, 2023

@patc3 No additional updates from me (emmeans) since my last comment. My repairs to recover_data are in the latest CRAN version and AFAIK, the additional notes (e.g., using addl.vars) will provide access to all the needed variables.

@mattansb
Copy link
Contributor Author

Sorry @patc3 - I haven't found the time to get back to this just yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants