Solved – Simulate data based on negative binomial regression coefficient

generalized linear modelrregression

I'm trying to simulate a dataframe with columns x and y based on a real-world dataset. Fitting a negative binomial regression model onto the real world dataset produced a coefficient of 0.1922 with a standard error of 0.0268.

Now I want to create an artificial dataframe based on this finding to run some follow up analyses. So far I've done the following to generate hypothetical y values:

y <- rlnorm(1000000) # generate 1,000,000 numbers from log normal distribution
y <- y*10 # multiply by ten to get more realistic numbers
y <- round(y, digits = 0) # round to whole number
hist(y) # take a look

How can I use the results of the negative binomial regression to map my simulated y values onto simulated x values?


FYI: If you need it, you can find a good overview of the negative binomial regression here

Best Answer

If a response $Y_i \in \left\{0, 1, \dotsc \right\}$ given a vector of covariates $X_i \in \mathbb{R}^p$ follows a negative binomial regression model (with a $\log$ link) and coefficients $\beta$ and an (inverse) dispersion parameter $\theta$ then $E(Y_i|X_i) = \exp( X_i \beta) = \mu_i$, $\text{Var} (Y_i | X_i) = \mu_i + \mu_i^2/ \theta$ and

$$ P(Y_i = y | X_i) = \binom{y+\theta - 1}{y} \left(\frac{\mu_i}{\mu_i + \theta} \right)^y \left(\frac{\theta}{\mu_i + \theta} \right)^{\theta} $$

In R, to simulate a vector of responses, given a design matrix X you could write

eta = X %*% beta
y = rnbinom(length(eta), mean = exp(eta), size = theta)