Solved – How to simulate new observations to a fitted model object while specifying the covariates in R

rsimulation

I have a model that is fitted with glm and I want to be able to calculate the response of this model when I specify the independent variables, while simulating the error terms. For example, if my formula is y~x1*x2, then this corresponds to:

$y_{i} = \mu + \beta_{1} x_{1i} + \beta_{2} x_{2i} + \beta_{3} x_{1i} x_{2i} + \epsilon_{i}$

where $\epsilon_{i}$ is an error term.

I want to be able to simulate

$y^{\star}_{i} = \hat{\mu} + \hat{\beta}_{1} x^{\star}_{1i} + \hat{\beta}_{2} x^{\star}_{2i}+ \hat{\beta}_{3} x^{\star}_{1i}x^{\star}_{2i} + \epsilon_{i}$,

with $\hat{\mu}$, $\hat{\beta}_{1}$, $\hat{\beta}_{2}$, and $\hat{\beta}_{3}$ being the values estimated by glm and $x^{\star}_{1 i}$ and $x^{\star}_{2 i}$ being the specified covariates / independent variables.

This is similar to what predict() does, however, it sets $\epsilon_{i}=0$.

It looks like what I want to do is similar to the functionality provided by simulate() and predict(). Although it doesn't look like simulate() allows you to specify the covariates.

What is the best way to do this?

Best Answer

Use the predict() command's newdata argument to obtain mean responses at given covariate value, then generate data using the rnorm(), rbinom(), rpois() function appropriate for your GLM. For Gaussian data you'll also need to specify a dispersion parameter, but this is part of the glm() output, or its summary().