Solved – R: random sampling for multivariate normal and log-normal distributions

lognormal distributionmultivariate analysisnormal distributionrrandom-generation

I want to generate random monthly (m) temperature (T) and Precipitation (P) data considering that both variables are intercorrelated (rTP[m])
The tricky thing is that my random variables that have specific quantitative properties: temperatures are normally distributed, while precipitations follow a log-normal distribution and should be log-transformed

mvrnorm of the package MASS could be used.

mT=c(1,2,4,7,10,15,17,18,17,10,5,1)
mP=c(3.9,3.7,3.9,4.1,4.5,4.7,4.8,4.8,4.4,4.1,4.2,3.9) #log-transformed
sdT=c(1,1,1,1,1,1,1,1,1,1,1,1)
sdP=c(0.7,0.8,0.7,0.6,0.4,0.4,0.4,0.5,0.6,1,0.8,0.6)  #log-transformed
rTP=c(0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.4)
covTP=rTP*(sdT*sdP)

simP=NULL
for (m in 1:12)
{
out=mvrnorm(500, mu = c(mT[m],mP[m]), Sigma = matrix(c(sdT[m]*sdT[m],covTP[m],covTP[m],sdP[m]*sdP[m]), ncol = 2), empirical = TRUE)
simP[m]=mean(exp(out[,2])-1)
}

In this case I generate two random time-series that are inter-correlated which is great.

However, simulated precipitations (simP) are on average higher than the observed one (mP)

plot(exp(mP)-1, type="l", lwd=2, ylim=c(0,250)); points(simP, type="l", lwd=2, lty=2)

I could use rlnorm or rlnorm.rplus to consider than precipitations are log-transformed, but then I have troubles with temperatures that are normally distributed.

My question is: How can I create random sampling for variables that have specific quantitative properties (log-normal and normal distributions)?

Best Answer

The the problem is in calculating the mean of the original distribution in the plot statement. The correct way is

plot(exp(mP + sdP^2/2), type="l", lwd=2, ylim=c(0,250)); points(simP, type="l", lwd=2, lty=2)

Mean and standard deviation do not commute with logarithm or exponent. In general

log(mean(mydata)) != mean(log(mydata))

and similarly also for sd, exp in any combination of the above You should also make sure that mP and sdP are the mean and sd of the transformed data and not the logarithms of the mean and sd of the raw data. ie you want

mP <- mean(log(P))
stP <- sd(log(P))

You can check the formula for the mean and other statistics of the log normal on the wikipedia