A standard way of generating overdispersed count data is to generate data from a Poisson distribution with a random mean: $Y_i\sim Poisson(\lambda_i)$, $\lambda_i \sim F$. For example, if $\lambda_i$ has a Gamma distribution, you will get the negative binomial distribution for $Y$.
You can easily impose serial correlation by imposing correlation on the $\lambda_i$'s. For example, you could have $\log\lambda_i \sim AR(1)$. Implemented in R:
N <- 100
rho <- 0.6
log.lambda <- 1 + arima.sim(model=list(ar=rho), n=N)
y <- rpois(N, lambda=exp(log.lambda))
> cor(head(y,-1), tail(y,-1))
[1] 0.4132512
> mean(y)
[1] 4.35
> var(y)
[1] 33.4015
Here $\lambda_i$'s come from a normal distribution, so the marginal distribution is not a classic distribution, but you could get more creative. Also note that the correlation of the $y$'s does not equal to rho
, but it is some function of it.
Take a look at how to simulate realizations of a Gaussian Process (GP). The smoothness of the realizations depend on the analytical properties of the covariance function of the GP. This online book has a lot of information: http://uncertainty.stat.cmu.edu/
This video gives a nice introduction to GP's: http://videolectures.net/gpip06_mackay_gpb/
P.S. Regarding your comment, this code may give you a start.
library(MASS)
C <- function(x, y) 0.01 * exp(-10000 * (x - y)^2) # covariance function
M <- function(x) sin(x) # mean function
t <- seq(0, 1, by = 0.01) # will sample the GP at these points
k <- length(t)
m <- M(t)
S <- matrix(nrow = k, ncol = k)
for (i in 1:k) for (j in 1:k) S[i, j] = C(t[i], t[j])
z <- mvrnorm(1, m, S)
plot(t, z)
Best Answer
It is really simple to generate multinomial logit regression data. All you need to keep in mind are the normalizing assumptions.
Here
mChoices
anddfM$y
encode the same information differently.