Solved – How to combine multiple prior components and a likelihood

decision-theorylikelihoodprior

Lets imagine I am comparing two groups of animals (treatment/control). There is previous data from cell cultures indicating the treatment should have a positive effect. This gives me "prior component 1". There are also two previous studies very similar to my own. One of them had an effect of 5 +/- 1 (prior component 2), the other of 1 +/- 2 (prior component 3). I feel the cell culture data is highly convincing, and that the prior component 3 is not such a reliable study. So I choose some weights of 3,1, and .5 for each and multiply.

1) To calculate the "overall prior" do I simply add these together as shown in the lower right panel?

2) Am I supposed to normalize these components before adding them?

enter image description here

I then calculate a likelihood function for my current data as shown in the upper panel.

enter image description here

3) How do I combine this information with the prior information shown in the first figure? For the lower panel I simply multiplied overall prior*likelihood.

4) I then want to make a decision based on this outcome. If I believe the effect is between -1 and 1 then I will stop studying the drug. If the effect is < -1 then I would perform new study A, if the effect is > 1 I will perform new study B.

5) Obviously there are a number of ways of choosing a decision (% density between -1 and 1, etc) Is there a best choice?

6) I feel I am doing something incorrectly, but maybe not. Is there a name for what I am trying to accomplish?

Edit:

If it helps I am trying to use the framework proposed by Richard Royall:

1) The likelihood function tells me "how to interpret this body of observations as evidence"

2) The likelihood function + priors tells be "what I should believe"

3) The likelihood function + priors +cost/benefit determines ""what I should do".

Royall R (1997) Statistical evidence: a likelihood paradigm (Chapman & Hall/CRC)

While the priors used here are subjective/nebulous they are built out of simple building blocks (uniform and normal distributions) that mathematically unsophisticated researchers can understand quickly. I think they convey my thought processes as a researcher well. Others may of course know of different background information. They should be able to build their own "compound prior" which may lead to a different decision than mine, but we should always agree on the likelihood function.

This approach (if implemented correctly, which I am not sure I am doing here), appears to me to model the actual thought processes of researchers and thus be suitable for scientific inference. The steps map to the common sections found in scientific papers. The priors are the introduction, the likelihood is the results, and the posterior probability is the discussion.

R code:

#Generate Priors
x<-seq(-10,10,by=.1)
y1<-dunif(seq(0,10,by=.1), min=-10, max=10)
y1<-c(rep(0,length(x)-length(y1)),y1)
y2<-dnorm(x, mean=5, sd=1)
y3<-dnorm(x, mean=1, sd=2)

#Weights for Priors
wt1<-3
wt2<-1
wt3<-.5

#Final Priors
y1<-y1*wt1
y2<-y2*wt2
y3<-y3*wt3

#Sum to get overall Prior
y<-y1+y2+y3

#Likelihood function for "current data"
lik<-10*dnorm(x, mean=1, sd=1)

#Updated Posterior Probability?
prob<-lik*y


par(mfrow=c(2,2))
plot(x,y1, ylim=c(0,1), type="l", lwd=4, 
     ylab="Density", xlab="Effect", main="Prior Component 1")
plot(x,y2, ylim=c(0,1), type="l", lwd=4, 
     ylab="Density", xlab="Effect", main="Prior Component 2")
plot(x,y3, ylim=c(0,1), type="l", lwd=4, 
     ylab="Density", xlab="Effect", main="Prior Component 3")
plot(x,y, ylim=c(0,1), type="l", lwd=4, 
     ylab="Density", xlab="Effect", main="Overall Prior")




dev.new()
par(mfrow=c(2,1))
plot(x,lik, type="l", lwd=4, col="Red",
     ylab="Likelihood", xlab="Effect", main="Likelihood")
plot(x,prob, type="l", lwd=4, col="Blue",
     ylab="Probability", xlab="Effect", main="Posterior Probability?")
abline(v=c(-1,1), lty=2, lwd=3)

Best Answer

I don't think there is anything wrong with your approach, but there are some technical details which makes your implementation incorrect. Now when you have a mixture prior, you will also get a mixture posterior. The easiest way to procede, I think, is to calculate the component posteriors and the component weights separately for each component. So the prior is given by:

$$p(\theta|I)\propto\sum_{c}w_cf_c(\theta)$$

Where you need to ensure that each $f_c(.)$ is a properly normalised density. The proportionality sign accounts for the sum of the weights not necessarily suming to one. Multiply by the likelihood and you have a posterior proportional to:

$$p(\theta|DI)\propto\sum_{c}w_c\left[f_c(\theta)p(D|\theta)\right]$$

Now we can turn this into a new mixture distribution by normalising the term in brackets, so we have:

$$p(\theta|DI)\propto\sum_{c}w_cf_c(D)\left[\frac{f_c(\theta)p(D|\theta)}{f_c(D)}\right]$$

Where $f_c(D)=\int f_c(\theta)p(D|\theta) d\theta$. As with the prior, the proportionality constant is just the sum of the "new" weights, for a properly normalised posterior of:

$$p(\theta|DI)=\left(\sum_{c}w_cf_c(D)\right)^{-1}\sum_{c}w_cf_c(D)\left[\frac{f_c(\theta)p(D|\theta)}{f_c(D)}\right]$$

This expression means that you can procede in two steps.

  1. Calculate the posterior based on each component individually
  2. Update the weights by multiplying them by the marginal likelihood for the data

This is very simple for your data, as you have two congujate "normal-normal" components, and one "uniform-normal" components. I would go further but I don't know if your data-based variance is assumed known or estimated from the data.

Related Question