Beta Distribution – Finding Estimates for Data Using the Cumulative Beta Distribution in R

beta distributioncurve fittingdistributionsr

I'm trying to find the estimates for $alpha$ and $beta$ of the beta distribution (red curve) that will fit the black curve (empirical) better.

trying to find parameters to make the red curve fit the black curve better

I have a function that finds "non-parametric" inverse cumulative function of my data (boot.mean). I would like to find a distribution that would fit. So far, I think cumulative beta distribution (alpha = beta = 2) could somehow be used. I would love to see a distribution that fits the bill better, though…

# raw data produced by a function (inverse cumulative distribution)
boot.mean <- c(37.021, 35.051, 29.091, 27.094, 22.058, 18.994, 16.944, 12.897, 7.903, 4.926, 3.939, 1.94, 1.94, 0.968)

#"fidge" (not sensu Climategate) boot.mean for comparison to qbeta
scaled <- 1 - (boot.mean/boot.mean[1])
scaled[1] <- 0.01 #dreader zero be gone
range(scaled)
 [1] 0.0100000 0.9738527

# this is the theoretical curve
x <- seq(0, 1, length = 100)
y <- qbeta(x, shape1 = 2, shape2 = 2)

# all along the x axis
x.axis <- seq(from = 0, to = 1, length = length(scaled))

# plot empirical and the theoretical values
plot(x.axis, scaled, type = "l")
lines(y, x, col = "red")

# I'm just an x-con trying to fit a distribution to my data
(beta.fitted <- MASS::fitdistr(x = scaled, densfun = qbeta, start = list(shape1 = 2, shape2 = 2)))
 Error in optim(x = c(0.01, 0.0532130412468599, 0.214202749790659, 0.268145106831258,  : 
   non-finite finite-difference value [2]
 In addition: There were 50 or more warnings (use warnings() to see the first 50)

Best Answer

First remark : your data is nowhere near a distribution, and definitely not the beta function. As I see it, you see your boot.mean as 'density' and your x-axis (the index?) as value. The beta function is limited between 0 and 1, and as the area under the curve of any density function should equal 1, your data doesn't come close. Good point of @whuber: fit a scaled version. Alternatively: Scale to the sum of the data, as @iterator said. Now as the beta function requires scaling twice (both on the X-axis, so the indices and on the Y-axis, being the actual data)

Now you talk about the beta function and you talked about the inverse of the cumulative normal distribution somewhere else. I suppose you mean 'when that distribution is looking in the mirror, it sees what I want to see...' ;-)

So an ad-hoc way of doing this (without any theoretical background, as that background is not the one you need here), is given below. Apart from what everybody else said here, I just want to point out the optim() function, which is doing basically what you're looking for. Regardless whether you fit a scaled and mirrored beta distribution or something that looks close to an inverse normal cumulative distribution for some value of close...

customFit <- function(x, data) {
    d.data <- rev(cumsum(dnorm(1:length(data), x[1], x[2]))) * max(data)
    SS <- sum((d.data - data)^2)
    return(SS)
}

fit.optim <- optim(c(5, 8), customFit, data = boot.mean)

plot(boot.mean)
lines(rev(cumsum(dnorm(1:length(boot.mean), 
         fit.optim$par[1], fit.optim$par[2]))) * max(boot.mean), 
         col = "red")

Note of warning: apart from having a defined function that fits your data, there's little you can do with this result...

Related Question