Solved – What’s the problem with model identifiability

bayesianidentifiabilityinference

I understand that in a decision perspective, identifiability of a model is needed to ensure the convergence (with increasing number of observations) of the parameters to estimate through a single value. But, if the non-identifiability of a given model is not a modeling artifacts but clearly characterises some "inaccessible knowledge" about the system under study, is it valid to perform bayesian inference on a non-identifiable model ?

Here is a simple example.
$$
x_i =t a y_i + \epsilon_i
$$
with $(\epsilon_i)$ iid
$$
\epsilon_i \sim N(0,1)
$$
and an informative prior for $t$:
$$
t\sim N(1,0.1)
$$
and a non-informative prior for $a$ (let says, that one chooses a uniform…)
$$
a \sim U(0,1000)
$$
One observes $(x_i)$ and $(y_i)$ are exogenous parameters and one wants to compute :
$$
p(a | (x_i); (y_i))
$$
As I understand it, the model is not identifiable as all the densities $p((x_i) | a,t;(y_i))$ described by the pairs $(a,t)$ such that $a.t=k$ ($k \in R$) are the same. Obviously in such a case the choice of $p(t)$ has a strong implication but if it is physically supported, I see no reason to invalidate the meaning of an HPD interval obtained from such a non-identifiable model. On the other hand, I do not manage to find any reference about that… so thanks for your expertise.

Best Answer

I recommend you read Andrew Gelman's blog post Think identifiability Bayesian inference.

Right off the bat, I can tell you that identifiability does not have to do with a model by itself (as in "an unidentifiable model"), rather than with the combination of this model with some data. That is to say, it has to do with the data also. The same model may be identifiable with some data, and unidentifiable with some other data.

In a Bayesian context, it is not clear as to what exactly identifiability means. As the link I provided says, it is not a "black-or-white" case. Rather, it has to with the amount of information learned from the data, or the "distance" of the posterior from the prior.

A perhaps suitable measure of information might be the Information Entropy, and while you are at it, the "distance" between two probability distributions (prior and posterior in this case) may be quantified by the Kullback-Leibler divergence, both of which can be found in the Wikipedia page on information theory.

So you could say that, for a given model and data, if the posterior carries the same amount of information as the prior, then nothing was learned about the model from this data, and the case is unidentifiable.

If on the other hand, the data are informative about the model parameters, then the posterior will be more informative than the prior (less information entropy than the prior, and KL divergence is positive) and the case is identifiable.

Based on all the intermediate states, that is, how much information gain happened, we can talk about more or less identifiable cases when the information gain from the prior to the posterior is more or less respectively.

Related Question