[Math] How to efficiently estimate quantile function of Gamma distribution

estimationgamma functionprobability distributionsstatistics

I have an application that analyzes datasets comprised mostly of samples from a Gamma distribution. Mixed in with the data are an unknown number ($>= 0$) of outlier samples (which are actually taken from a scaled noncentral chi-squared distribution with unknown centrality parameter) that are significantly larger in magnitude than the samples from the Gamma distribution. I would like to locate these values with as high of a detection probability as possible, under the constraint that the probability of falsely choosing one of the Gamma-distributed samples as an outlier is bounded by a specified probability $P_{f}$ (where $P_f$ is on the order of $10^{-8}$ to $10^{-12}$ or so).

The shape parameter $k$ of the Gamma distribution is known. I've developed the following method of approaching the problem:

  • Fit a Gamma distribution to the data using a method that attempts to be robust to outliers:

    • Since the shape parameter is known, I just need the scale parameter $\theta$ in order to describe the distribution.

    • The mode of the distribution is equal to $(k-1)\theta$, so estimating the distribution's mode will provide a way to estimate $\theta$. I'm using the half-sample range method to estimate the distribution's mode (i.e. the PDF's peak).

    • Given the mode estimate $m$, estimate the scale paramter as $\tilde \theta = \frac{m}{k-1}$.

  • Given the desired false-detection probability $P_f$, calculate the threshold $x$ where $F_x(x) = 1 – P_f$, where $F_x(x)$ is the CDF of the fitted Gamma distribution.

  • Declare all values in the sample greater than $x$ as outliers.

I'm looking for a computationally efficient way to estimate the appropriate threshold $x$. This is intended for a real-time processing application, and the method of directly inverting the Gamma distribution's CDF requires me to perform root-finding on special functions that are relatively expensive to compute. If there is a more efficient way to estimate (even approximately) the desired threshold in terms of the distribution parameters, that would help a lot.

Best Answer

Perhaps unsurprisingly the quantile function of the gamma distribution does not have a nice closed form. I suspect that a lot of thought has been put into R's implementation of its approximation.