In case anyone is still interested, I have managed to implement Aristizabal's formulae in Java. This is more proof-of-concept than the requested "robust" code, but it is a starting point.
/**
* Computes the point estimate of the shift offset (gamma) from the given sample. The sample array will be sorted by this method.<p>
* Cf. Aristizabal section 2.2 ff.
* @param sample {@code double[]}, will be sorted
* @return gamma point estimate
*/
public static double pointEstimateOfGammaFromSample(double[] sample) {
Arrays.sort(sample);
DoubleUnaryOperator func = x->calculatePivotalOfSortedSample(sample, x)-1.0;
double upperLimit = sample[0];
double lowerLimit = 0;
double gamma = bisect(func, lowerLimit, upperLimit);
return gamma;
}
/**
* Cf. Aristizabal's equation (2.3.1)
* @param sample {@code double[]}, should be sorted in ascending order
* @param gamma shift offset
* @return pivotal value of sample
*/
private static double calculatePivotalOfSortedSample(final double[] sample, double gamma) {
final int n=sample.length;
final int n3=n/3;
final double mid = avg(sample, gamma, n3+1, n-n3);
final double low = avg(sample, gamma, 1, n3);
final double upp = avg(sample, gamma, n-n3+1, n);
final double result = (mid-low)/(upp-mid);
return result;
}
/**
* Computes average of sample values from {@code sample[l-1]} to {@code sample[u-1]}.
* @param sample {@code double[]}, should be sorted in ascending order
* @param gamma shift offset
* @param l lower limit
* @param u upper limit
* @return average
*/
private static double avg(double[] sample, double gamma, int l, int u) {
double sum = 0.0;
for (int i=l-1;i<u;sum+=Math.log(sample[i++]-gamma));
final int n = u-l+1;
return sum/n;
}
/**
* Naive bisection implementation. Should always complete if the given values actually straddles the root.
* Will call {@link #secant(DoubleUnaryOperator, double, double)} if they do not, in which case the
* call may not complete.
* @param func Function solve for root value
* @param lowerLimit Some value for which the given function evaluates < 0
* @param upperLimit Some value for which the given function evaluates > 0
* @return x value, somewhere between the lower and upper limits, which evaluates close enough to zero
*/
private static double bisect(DoubleUnaryOperator func, double lowerLimit, double upperLimit) {
final double eps = 0.000001;
double low=lowerLimit;
double valAtLow = func.applyAsDouble(low);
double upp=upperLimit;
double valAtUpp = func.applyAsDouble(upp);
if (valAtLow*valAtLow>0) {
// Switch to secant method
return secant(func, lowerLimit, upperLimit);
}
System.out.printf("bisect %f@%f -- %f@%f%n", valAtLow, low, valAtUpp, upp);
double mid;
while(true) {
mid = (upp+low)/2;
if (Math.abs(upp-low)/low<eps)
break;
double val = func.applyAsDouble(mid);
if (Math.abs(val)<eps)
break;
if (val<0)
low=mid;
else
upp=mid;
}
return mid;
}
/**
* Naive secant root solver implementation. May not complete if root not found.
* @param f Function solve for root value
* @param a Some value for which the given function evaluates
* @param b Some value for which the given function evaluates
* @return x value which evaluates close enough to zero
*/
static double secant(final DoubleUnaryOperator f, double a, double b) {
double fa = f.applyAsDouble(a);
if (fa==0)
return a;
double fb = f.applyAsDouble(b);
if (fb==0)
return b;
System.out.printf("secant %f@%f -- %f@%f%n", fa, a, fb, b);
if (fa*fb<0) {
return bisect(f, a, b);
}
while ( abs(b-a) > abs(0.00001*a) ) {
final double m = (a+b)/2;
final double k = (fb-fa)/(b-a);
final double fm = f.applyAsDouble(m);
final double x = m-fm/k;
if (Math.abs(fa)<Math.abs(fb)) {
// f(a)<f(b); Choose x and a
b=x;
fb=f.applyAsDouble(b);
} else {
// f(a)>=f(b); Choose x and b
a=x;
fa=f.applyAsDouble(a);
}
if (fa==0)
return a;
if (fb==0)
return b;
if (fa*fb<0) {
// Straddling root; switch to bisect method
return bisect(f, a, b);
}
}
return (a+b)/2;
}
To calculate the density of any conjugate prior see here.
However, you don't need to evaluate the conjugate prior of the Dirichlet in order to perform Bayesian estimation of its parameters. Just average the sufficient statistics of all the samples, which are the vectors of log-probabilities of the components of your observed categorical distribution parameters. This average sufficient statistic are the expectation parameters of the maximum likelihood Dirichlet fitting the data $(\chi_i)_{i=1}^n$. To go from expectation parameters to source parameters, say $(\alpha_i)_{i=1}^n$, you need to solve using numerical methods:
\begin{align}
\chi_i = \psi(\alpha_i) - \psi\left(\sum_j\alpha_j\right) \qquad \forall i
\end{align}
where $\psi$ is the digamma function.
To answer your first question, a mixture of Dirichlets is not Dirichlet because, for one thing, it can be multimodal.
Best Answer
Unless you have some good reason to prefer the likelihood estimates to moment estimates, you may want to use the latter, as they are much easier to compute for this case. The wikipedia article on the beta distribution helpfully has the moment estimators written out in closed form as well as a note that the MLE does not have a closed form solution. If you want the MLE, you will have to program an algorithm to find it (not hard, but probably unnecessary effort).
Method of Moments Estimators from http://en.wikipedia.org/wiki/Beta_distribution#Parameter_estimation
Where,
and,
The method of moments estimators for the two parameters are: