Solved – Sample from distribution given by histogram

density functionhistogramrsampling

Given a histogram obtained using given data points, how do I randomly sample from the distribution predicted by the histogram?

Any conceptual comment / R code would be welcome.

Best Answer

Since the sampling from a kernel density estimate is solved once or twice already, I'll focus on sampling from a histogram-as-population-pdf.

The idea is simply

For each observation in the new sample

  1. choose a histogram bin according to the proportions of 
     the original sample (treated as a discrete pmf)

  2. sample uniformly from that bin-interval

For example in R:

#create an original histogram
x=rgamma(200,4)
xhist=hist(x,freq=FALSE)

#sample from it
samplesize=400
bins=with(xhist,sample(length(mids),samplesize,p=density,replace=TRUE)) # choose a bin
result=runif(length(bins),xhist$breaks[bins],xhist$breaks[bins+1]) # sample a uniform in it
hist(result,freq=FALSE,add=TRUE,bord=3)

Just for completeness, (since sampling from the kernel density estimate* is very simple):

repeat nsim times:
  sample (with replacement) a random observation from the data
  sample from the kernel, and add the previously sampled random observation

* note that some kernels - like fourth order kernels - are not densities and this assumes that the kernel is a density

In R, for a Gaussian kernel and bandwidth h, with data in x:

 dnorm(nsim,m=sample(x,nsim,replace=TRUE), s=h)
Related Question