Solved – Simulating Multinomial Logit Data with R

logitsimulation

I'm looking to generate fake data to fit a multinomial logit in R? Any code/suggestions on material to look at would be very much appreciated…

Thanks.

Best Answer

It is really simple to generate multinomial logit regression data. All you need to keep in mind are the normalizing assumptions.

# covariate matrix
mX = matrix(rnorm(1000), 200, 5)

# coefficients for each choice
vCoef1 = rep(0, 5)
vCoef2 = rnorm(5)
vCoef3 = rnorm(5)

# vector of probabilities
vProb = cbind(exp(mX%*%vCoef1), exp(mX%*%vCoef2), exp(mX%*%vCoef3))

# multinomial draws
mChoices = t(apply(vProb, 1, rmultinom, n = 1, size = 1))
dfM = cbind.data.frame(y = apply(mChoices, 1, function(x) which(x==1)), mX)

Here mChoices and dfM$y encode the same information differently.

Related Solutions

Solved – Generating over-dispersed counts data with serial correlation

A standard way of generating overdispersed count data is to generate data from a Poisson distribution with a random mean: $Y_i\sim Poisson(\lambda_i)$, $\lambda_i \sim F$. For example, if $\lambda_i$ has a Gamma distribution, you will get the negative binomial distribution for $Y$.

You can easily impose serial correlation by imposing correlation on the $\lambda_i$'s. For example, you could have $\log\lambda_i \sim AR(1)$. Implemented in R:

N <- 100
rho <- 0.6
log.lambda <- 1 + arima.sim(model=list(ar=rho), n=N)
y <- rpois(N, lambda=exp(log.lambda))
> cor(head(y,-1), tail(y,-1))
[1] 0.4132512
> mean(y)
[1] 4.35
> var(y)
[1] 33.4015

Here $\lambda_i$'s come from a normal distribution, so the marginal distribution is not a classic distribution, but you could get more creative. Also note that the correlation of the $y$'s does not equal to rho, but it is some function of it.

Simulation – How to Simulate Functional Data Using R

Take a look at how to simulate realizations of a Gaussian Process (GP). The smoothness of the realizations depend on the analytical properties of the covariance function of the GP. This online book has a lot of information: http://uncertainty.stat.cmu.edu/

This video gives a nice introduction to GP's: http://videolectures.net/gpip06_mackay_gpb/

P.S. Regarding your comment, this code may give you a start.

library(MASS)
C <- function(x, y) 0.01 * exp(-10000 * (x - y)^2) # covariance function
M <- function(x) sin(x) # mean function
t <- seq(0, 1, by = 0.01) # will sample the GP at these points
k <- length(t)
m <- M(t)
S <- matrix(nrow = k, ncol = k)
for (i in 1:k) for (j in 1:k) S[i, j] = C(t[i], t[j])
z <- mvrnorm(1, m, S)
plot(t, z)

Best Answer

Related Solutions

Solved – Generating over-dispersed counts data with serial correlation

Simulation – How to Simulate Functional Data Using R

Related Question