i am studying the exponential family- and read that, for $p(x|\mu)=h(x)exp(\eta^T t(x)-a(\eta))$, that $a(\eta)$ is the log normalizer, which ensures that the probability distribution integrates to one. Hence – that as consequence $a(\eta)=\log \int \left(h(x) \exp \eta^T t(x) \right)$. i have tried to set the integral of $p(x|\mu)$ to 1 and derive this implication, but have failed so far. can anyone help me how to approach this?
[Math] log normalizer – exponential family
calculusprobabilityself-learning
Related Solutions
Your question can be rephrased somewhat into 'does the expected value of the derivative of the log-likelihood always point towards the correct value?' (if it doesn't then you can turn it into a counter example of your hypothesis by optionally flipping the sign of $\theta$).
This won't be true in general, you could for instance come up with a distribution like:
$$ \sin(x + \theta)^2 / (1+x^2) $$
which is periodic in $\theta$, clearly the derivative at $\theta + 2\pi$ must be equal to the one at $\theta$, and clearly $E_\theta[S(\theta_1, X)] = E_{\theta+2\pi}[S(\theta_1, X)]$ so both can't point to the 'correct' value at the same time.
However having a probability distribution where several $\theta$ are equivalent is clearly not usually the case. So we need to require some kind of 'unimodality'. To see what kind we need it's instructive to pull the derivative outside of the expectation:
$$ \begin{align} \int f(z;\theta) \frac{\partial \log f(z;\theta_1)}{\partial \theta_1} \,\mathrm{d}z &=\frac{\partial}{\partial \theta_1} \int f(z;\theta) \log f(z;\theta_1) \,\mathrm{d}z \end{align} $$
so now we're looking at the (negative) derivative of the cross entropy (which is also the derivative of the Kullback-Leibler divergence), which is a measure of how close the distribution $f(z;\theta_1)$ is to the 'true' distribution $f(z;\theta)$. It's now clear why its derivative is usually pointing the right way, since we'd generally expect the model to get better if the parameters are closer to their actual values.
Anyway from this we can extract, a sufficient, but maybe not necessary condition, which is for the probability distribution to be log concave (i.e. $\log(f(z;\theta_1))$ is concave w.r.t. $\theta_1$), in that case it's expected value
$$ \int f(z;\theta) \log f(z;\theta_1) \,\mathrm{d}z $$
is also concave, which in particular means that it's derivative is monotonically non-increasing and is $0$ at $\theta_1 = \theta$, this is enough to conclude that $E_{\theta}[S(\theta_1, X)]$ is pointing towards $\theta$.
The exponential distribution and normal distribution are all log-concave w.r.t all their parameters, but keep in mind that most distributions are called log-concave when they're log-concave w.r.t to the value (here $z$) not the parameters (here $\theta_1$).
I hate to break it to you, but it isn't part of the exponential family.
Typo in q: should be $(1-y^\alpha)^{\beta-1}$ in the PDF.
$\ln f(y; \alpha, \beta) = \ln \alpha + \ln \beta + (\alpha-1)\ln y - (\beta-1)\ln(1-y^{\alpha}) $
The final term has $y, \alpha , \beta$ in it, while not being a linear function of $y$, so it can't be said to be of the form $\frac{\theta y}{a(\phi)}$
Best Answer
If $$1 = \int h(x)\exp(\eta^T t(x)-a(\eta)) \,dx $$ then using $\exp(y-z)=\exp(y)\exp(-z)$ $$1 = \int h(x)\exp(\eta^T t(x))\exp(-a(\eta)) \,dx $$ multiplying both sides by $\exp(a(\eta))$ which is a constant with respect to $x$ $$\exp(a(\eta)) = \int h(x)\exp(\eta^T t(x)) \,dx $$ and taking logarithms $$a(\eta)= \log\left(\int h(x)\exp(\eta^T t(x)) \,dx \right)$$