Solved – Bayesian estimation of Dirichlet distribution parameters

bayesiandirichlet distributionestimationgibbs

I want to estimate parameters of Dirichlet mixture models using Gibbs sampling and I have some questions about that:

Is a mixture of Dirichlet distributions equivalent to a Dirichlet process? What is their main differences if is not?
Also, if I want to estimate a single Dirichlet distribution's parameters, which distribution for parameters should be selected as priors in Bayesian framework?

In all of the papers I found an estimation of a multinomial distribution using Dirichlet priors. I need estimation of a Dirichlet distribution using multinomial priors, perhaps.

Is the posterior function also in the form of DIRICHLET(α+N) similar to the case “estimation of multinomial distribution using Dirichlet priors”? as the multiplication of probability density function for iid samples are not considered in the definition of the likelihood function. I again cannot understand why.

e.g. as expressed in this paper:
http://www.stat.ufl.edu/~aa/cda/bayes.pdf
or
http://research.microsoft.com/en-us/um/people/minka/papers/minka-multinomial.pdf

so thanks for your attention

my data is Hyperion (a kind of hyperspectral remote sensing imagery) and i want to perform hyperspectral unmixing using mixture of Dirichlet sources and i will apply Gibbs sampling method for parameter estimation. my data is in dimension (614*512*224) which is commonly available AVIRIS sensor data for Cuprite Nevada district and is almost 200MB. also this data is available via (http://aviris.jpl.nasa.gov/data/free_data.html). unfortunately i don't know how can i sent my data.

i just ask you to help me in statistical modelling tasks for my PHD thesis.
i will be so grateful if you help me to solve my confusions in modelling.

all the best
solmaz

Best Answer

To calculate the density of any conjugate prior see here.

However, you don't need to evaluate the conjugate prior of the Dirichlet in order to perform Bayesian estimation of its parameters. Just average the sufficient statistics of all the samples, which are the vectors of log-probabilities of the components of your observed categorical distribution parameters. This average sufficient statistic are the expectation parameters of the maximum likelihood Dirichlet fitting the data $(\chi_i)_{i=1}^n$. To go from expectation parameters to source parameters, say $(\alpha_i)_{i=1}^n$, you need to solve using numerical methods: \begin{align} \chi_i = \psi(\alpha_i) - \psi\left(\sum_j\alpha_j\right) \qquad \forall i \end{align} where $\psi$ is the digamma function.

To answer your first question, a mixture of Dirichlets is not Dirichlet because, for one thing, it can be multimodal.

Related Solutions

Solved – Calculating Log Prob. of Dirichlet distribution in High Dimensions

The p.d.f of the Dirichlet distribution is defined as $$ f(\theta; \alpha) = B^{-1} \prod_{i=1}^K \theta_i^{\alpha_i - 1} $$ where $B(\alpha)$ is the generalized Beta function. Notice that if any $\theta_i$ is 0, then the whole product is zero. In other words, the support of a Dirichlet distribution is over vectors $\theta$ where each $\theta_i \in (0, 1)$ and $\sum_{i=1}^K \theta_i = 1$. I'm not familiar with Minka's toolkit, but it is bound to have problems with data that includes 0's.

As for the uniform column, I believe those values are correct. Here my python code I used to test:

import math

def lbeta(alpha):
    return sum(math.lgamma(a) for a in alpha) - math.lgamma(sum(alpha))

def ldirichlet_pdf(alpha, theta):
    kernel = sum((a - 1) * math.log(t) for a, t in zip(alpha, theta))
    return kernel - lbeta(alpha)

for k in [4, 10, 50, 100, 500, 1000]:
    print ldirichlet_pdf([.01] * k, [1.0 / k] * k)

Running this script yields the output:

4 -9.71111566837
10 -20.946493708
50 -35.7564901905
100 -4.03613939138
500 779.669123528
1000 2251.99967563

Now lets generate some more likely vectors from our Dirichlet distribution with K=1000. The code for this is quite simple:

def sample_dirichlet(alpha):
    gammas = [random.gammavariate(a, 1) for a in alpha]
    norm = sum(gammas)
    return [g / norm for g in gammas]

Now if we use this function in combination with our previous ldirichlet_pdf function, we'll see that for K=1000, 2e+3 is a relatively small density. For example, the results of the following code:

alpha = [.01] * 1000
ldirichlet_pdf(alpha, sample_dirichlet(alpha))

yield values between 9.4e+4 and 1e+5.

The key insight here is to realize that you do not need to have values less than 1 in order for the integral to evaluate to 1. A simple example is $\int_0^1 1 dx = 1$. It just so happens that for the symmetric Dirichlet with K=1000 and concentration .01, the p.d.f. is greater than 1 everywhere, and yet the integral over the entire support is still 1. In higher dimensions, you'll need to have a much smaller concentration to get the uniform to have a negative log p.d.f. For example, with the concentration at .0001, and K=1000, the uniform vector has a log p.d.f. around -2.3e+3.

Solved – PyMC for nonparametric clustering: Dirichlet process to estimate Gaussian mixture’s parameters fails to cluster

I'm not sure if anyone is looking at this question any more but I put your question in to rjags to test Tom's Gibbs sampling suggestion while incorporating insight from Guy about the flat prior for standard deviation.

This toy problem might be difficult because 10 and even 40 data points are not enough to estimate variance without an informative prior. The current prior σzi∼Uniform(0,100) is not informative. This might explain why nearly all draws of μzi are the expected mean of the two distributions. If it does not alter your question too much I will use 100 and 400 data points respectively.

I also did not use the stick breaking process directly in my code. The wikipedia page for the dirichlet process made me think p ~ Dir(a/k) would be ok.

Finally it is only a semi-parametric implementation since it still takes a number of clusters k. I don't know how to make an infinite mixture model in rjags.

markov chain mu cluster 1

markov chain mu cluster 2

library("rjags")

set1 <- rnorm(100, 0, 1)
set2 <- rnorm(400, 4, 1)
data <- c(set1, set2)

plot(data, type='l', col='blue', lwd=3,
     main='gaussian mixture model data',
     xlab='data sample #', ylab='data value')
points(data, col='blue')

cpd.model.str <- 'model {
  a ~ dunif(0.3, 100)
  for (i in 1:k){
    alpha[i] <- a/k
    mu[i] ~ dnorm(0.0, 0.001)
    sigma[i] ~ dunif(0, 100)
  }
  p[1:k] ~ ddirich(alpha[1:k])
  for (i in 1:n){
    z[i] ~ dcat(p)
    y[i] ~ dnorm(mu[z[i]], pow(sigma[z[i]], -2))
  }
}' 


cpd.model <- jags.model(textConnection(cpd.model.str),
                        data=list(y=data,
                                  n=length(data),
                                  k=5))
update(cpd.model, 1000)
chain <- coda.samples(model = cpd.model, n.iter = 1000,
                      variable.names = c('p', 'mu', 'sigma'))
rchain <- as.matrix(chain)
apply(rchain, 2, mean)

Best Answer

Related Solutions

Solved – Calculating Log Prob. of Dirichlet distribution in High Dimensions

Solved – PyMC for nonparametric clustering: Dirichlet process to estimate Gaussian mixture’s parameters fails to cluster

Related Question