Solved – Simulating a bimodal distribution in the range of [1;5] in R

bimodalmixture-distributionnormal distributionrsimulation

I want to simulate a continuous data set/variable with lower/upper bounds of [1;5], while at the same time ensure that the drawn distribution can be considered as bimodal.

Searching for my problem, I found this source, which helps to simulate a bimodal distribution, however, it doesn't apply the lower/upper bounds: https://stats.stackexchange.com/search?q=bimodal+truncated+distribution

In contrast, the rtruncnorm function in R (from package truncnorm) helps me to simulate a normal (but not bimodal) distribution with lower/upper bounds.

Question now is, how can I combine both? Theoretically, I could just use the approach from the first link, i.e. generate a bimodal distribution with two underlying normal distributions and then just recalculate the drawn data with this approach (https://stats.stackexchange.com/a/25897/66544) to get my bounds.

Or I could generate two truncated normal distributions with the rtruncnorm function and then combine it to a bimodal distribution following the approach from the first link.

But I'm not sure if either of these approaches is mathematically justified.

NOTE: why do I want a range of [1;5] anyway? The real data would come from a survey where respondents will answer on a 5 point scale from 1-5 (continuously, not discrete), hence I need to simulate this finiteness.

Best Answer

The easiest approach would be to draw $\frac{n}{2}$ samples from a truncated normal distribution with one mean and another $\frac{n}{2}$ samples from a truncated normal distribution with a different mean. This is a mixture, specifically one with equal weights; you could also use different weights by varying the proportions by which you draw from both distributions.

library(truncnorm)

nn <- 1e4
set.seed(1)
sims <- c(rtruncnorm(nn/2, a=1, b=5, mean=2, sd=.5),
                    rtruncnorm(nn/2, a=1, b=5, mean=4, sd=.5))

hist(sims)

Related Solutions

R – Simulating Constrained Normal Distribution on Bound

This is called a truncated normal distribution:

http://en.wikipedia.org/wiki/Truncated_normal_distribution

Christian Robert wrote about an approach to doing it for a variety of situations (using different depending on where the truncation points were) here:

Robert, C.P. (1995) "Simulation of truncated normal variables",
Statistics and Computing, Volume 5, Issue 2, June, pp 121-125

Paper available at http://arxiv.org/abs/0907.4010

This discusses a number of different ideas for different truncation points. It's not the only way of approaching these by any means but it has typically pretty good performance. If you want to do a lot of different truncated normals with various truncation points, it would be a reasonable approach. As you noted, msm::tnorm is based on Robert's approach, while truncnorm::truncnorm implements Geweke's (1991) accept-reject sampler; this is related to the approach in Robert's paper. Note that msm::tnorm includes density, cdf, and quantile (inverse cdf) functions in the usual R fashion.

An older reference with an approach is Luc Devroye's book; since it went out of print he's got back the copyright and made it available as a download.

Your particular example is the same as sampling a standard normal truncated at 1 (if $t$ is the truncation point, $(t-\mu)/\sigma = (5-3)/2 = 1$), and then scaling the result (multiply by $\sigma$ and add $\mu$).

In that specific case, Robert suggests that your idea (in the second or third incarnation) is quite reasonable. You get an acceptable value about 84% of the time and so generate about $1.19 n$ normals on average (you can work out bounds so that you generate enough values using a vectorized algorithm say 99.5% of the time, and then once in a while generate the last few less efficiently - even one at a time).

There's also discussion of an implementation in R code here (and in Rccp in another answer to the same question, but the R code there is actually faster). The plain R code there generates 50000 truncated normals in 6 milliseconds, though that particular truncated normal only cuts off the extreme tails, so a more substantive truncation would mean the results were slower. It implements the idea of generating "too many" by calculating how many it should generate to be almost certain to get enough.

If I needed just one particular kind of truncated normal a lot of times, I'd probably look at adapting a version of the ziggurat method, or something similar, to the problem.

In fact it looks like Nicolas Chopin did just that already, so I'm not the only person that has occurred to:

http://arxiv.org/abs/1201.6140

He discusses several other algorithms and compares the time for 3 versions of his algorithm with other algorithms to generate 10^8 random normals for various truncation points.

Perhaps unsurprisingly, his algorithm turns out to be relatively fast.

From the graph in the paper, even the slowest of the algorithms he compares with at the (for them) worst truncation points are generating $10^8$ values in about 3 seconds - which suggests that any of the algorithms discussed there may be acceptable if reasonably well implemented.

Edit: One that I am not certain is mentioned here (but perhaps it's in one of the links) is to transform (via inverse normal cdf) a truncated uniform -- but the uniform can be truncated by simply generating a uniform within the truncation bounds. If the inverse normal cdf is fast this is both fast and easy and works well for a wide range of truncation points.

Solved – Simulate from a truncated mixture normal distribution

Simulation from a truncated normal is easily done if you have access to a proper normal quantile function. For instance, in R, simulating $$ \mathcal{N}_a^b(\mu,\sigma^2)$$where $a$ and $b$ denote the lower and upper bounds can be done by inverting the cdf $$\dfrac{\Phi(\sigma^{-1}\{x-\mu\})-\Phi(\sigma^{-1}\{a-\mu\})}{\Phi(\sigma^{-1}\{b-\mu\})-\Phi(\sigma^{-1}\{a-\mu\})} $$ e.g., in R

x = mu + sigma * qnorm( pnorm(a,mu,sigma) + 
     runif(1)*(pnorm(b,mu,sigma) - pnorm(a,mu,sigma)) )

Otherwise, I developed a truncated normal accept-reject algorithm twenty years ago.

If we consider the truncated mixture problem, with density $$ f(x;\theta) \propto \left\{p\varphi(x;\mu_1,\sigma_1)+(1-p)\varphi(x;\mu_2,\sigma_2)\right\}\mathbb{I}_{[a,b]}(x) $$ it is a mixture of truncated normal distributions but with different weights: $$ f(x;\theta) \propto p\left\{\Phi(\sigma_1^{-1}\{b-\mu_1\})-\Phi(\sigma_1^{-1}\{a-\mu_1\}) \right\}\dfrac{\sigma_1^{-1}\phi(\sigma_1^{-1}\{x-\mu_1\})}{\Phi(\sigma_1^{-1}\{b-\mu_1\})-\Phi(\sigma_1^{-1}\{a-\mu_1\})} \\[15pt] +(1-p)\left\{\Phi(\sigma_2^{-1}\{b-\mu_2\})-\Phi(\sigma_2^{-1}\{a-\mu_2\}) \right\}\dfrac{\sigma_2^{-1}\phi(\sigma_2^{-1}\{x-\mu_2\})}{\Phi(\sigma_2^{-1}\{b-\mu_2\})-\Phi(\sigma_1^{-1}\{a-\mu_2\})} $$ Therefore, to simulate from a truncated normal mixture, it is sufficient to take $$x=\begin{cases} x_1\sim\mathcal{N}_a^b(\mu_1,\sigma_1^2) &\text{with probability }\\ &\qquad p\left\{\Phi(\sigma_1^{-1}\{b-\mu_1\})-\Phi(\sigma_1^{-1}\{a-\mu_1\}) \right\}\big/\mathfrak{s}\\ x_2\sim\mathcal{N}_a^b(\mu_2,\sigma_2^2) &\text{with probability }\\ &\qquad(1-p)\left\{\Phi(\sigma_2^{-1}\{b-\mu_2\})-\Phi(\sigma_2^{-1}\{a-\mu_2\}) \right\}\big/\mathfrak{s} \end{cases} $$ where \begin{align} \mathfrak{s}=&p\left\{\Phi(\sigma_1^{-1}\{b-\mu_1\})-\Phi(\sigma_1^{-1}\{a-\mu_1\}) \right\}+ \\ &(1-p)\left\{\Phi(\sigma_2^{-1}\{b-\mu_2\})-\Phi(\sigma_2^{-1}\{a-\mu_2\}) \right\} \end{align}

Best Answer

Related Solutions

R – Simulating Constrained Normal Distribution on Bound

Solved – Simulate from a truncated mixture normal distribution

Related Question