Solved – R: How to estimate the parameters of a Gamma distribution, knowing the empirical cumulative distribution

fittingr

I have been given an empirical cumulative distribution, and I need to fit a Gamma distribution to it using R.

By trial-and-error (and using Excel's Solver), I know that's possible, but I need to automate this process (I need to update this distribution each week, so automation is needed); I'd like to do it in R, so I can use the results in some other reports (using sweave for example).

Data sample:

           F
0  0.0000000
1  0.4761905
2  0.5665025
3  0.6284307
4  0.6851982
5  0.7404266
6  0.7994206
7  0.7994206
8  0.8595944
9  0.8595944
10 0.8595944

Best estimate (so far): Gamma distribution with the following parameters: alpha = 0.357779 and theta = 12.019887

Best Answer

If the observations only take on integer values, then while a gamma distribution might be a reasonable summary function, you don't really have an underlying gamma distribution.

However, if the integers were just used to bin the data, then you can determine the maximum likelihood estimates of the two gamma distribution parameters in the following manner. I've made the assumption that you have the sample size available and for this example I used $n=235$.

# Available sample cumulative distribution function
  x <- c(0:10,Inf)
  f <- c(0.0000000,0.4761905,0.5665025,0.6284307,0.6851982,0.7404266,0.7994206, 
    0.7994206,0.8595944,0.8595944,0.8595944,1)

# Determine counts in each bin assuming that the total sample size is 235
  counts <- diff(floor(235*f+0.5))
  counts

# Define log likelihood function for the binned data
logL <- function(p) {
  s <- 0
  for (i in 1:(length(counts)-1)) {
      s <- s + counts[i]*log(pgamma(x[i+1],p[1],1/p[2])-pgamma(x[i],p[1],1/p[2]))
  }
  s <- s + counts[length(counts)]*log(1-pgamma(x[length(counts)-1],p[1],1/p[2]))
  s
}

# Find maximum likelihood solution (with starting values from other method)
  solution <- optim(c(0.357779,12.019887), logL, method="L-BFGS-B",
    lower=c(0.001,0.001), upper=c(Inf,Inf), control=list(fnscale=-1))
  solution
#  $par
#[1]  0.3370922 12.3350379
#
#$value
#[1] -412.9522
# 
#$counts
#function gradient 
#     17       17 
#
#$convergence
#[1] 0
#
#$message
#[1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"

# Plot results
  plot(c(0:10), f[1:11], type="s", xlab="x", ylab="Probability")
  lines(c(0:100)/10, pgamma(c(0:100)/10,solution$par[1],1/solution$par[2]), col="red")

cumulative distribution function fit