Solved – How to compute the maximum a posteriori probability (MAP) estimate with / without a prior

bayesianestimationmachine learning

I am a newbie in this area so I hope someone could explain the following problem to me in plain English.

Assume I want to use MAP to estimate some parameters on the basis of some observations. I know the method of computing MAP is:
$$
\theta(x) = {\rm argmax} \ f(X|\theta) g(\theta)
$$

where $g$ is the prior. However, I cannot find any answers online on how to compute this using a real world example. So here is my proposed question:

Assume you asked 100 people of who they are going to vote for in an election (out of 2 candidates A and B), and assume the end result is 60% of them saying they will vote for A. How do you estimate the result of an election using MAP if:

  1. candidate A is known to have a popularity of 40% and candidate B 60% (assume this to be the prior distribution)
  2. the popularity is unknown.

I also looked at this answer but I'm still confused:
Example of maximum a posteriori estimation

Best Answer

As mentioned in a comment, the MAP estimate is the maximum likelihood estimate when you omit $g(\theta)$ or if it is a constant. If $g(\theta)$ is not a constant, then there are of course various methods for finding the MAP estimate. Omitting the survey sampling aspect (or assuming we have a completely representative sample from a population of infinite size or assuming you have included the sampling mechanism into your likelihood):

  1. Analytically (often by taking logs and finding the maximum).
  2. In some cases conjugate priors are available have known modes so that you do not need to do the analytic calculation yourself. E.g. in the example you give we could use a Beta prior. You did not specify how certain you were about your prior, but let's say that in a previous survey you had 20 out of 50 for "A" and 30 out of 50 for "B" (and that there are no other options to vote for). If you are happy to use a Beta(20,30) prior, then your posterior is a Beta(20+60, 30+40) distribution. The mode is then known to be (80-1)/(150-2)=0.53 This would not be correct for a non-representative sample or one from a non-infinite population and this option only exists for a few distributions. Additionally, just because a conjugate prior is available and convenient does not mean it is what you want to use (e.g. you may have wanted to express some doubt about the applicability of the previous survey to your new survey by using a mixture of a Beta(0.5,0.5) prior and a Beta(20,30) prior with weights of 0.2 and 0.8 to express this uncertainty. Then you can still do conjugate updating, but getting the updated posterior weights is a tiny bit harder.
  3. Using some numeric minimization routine.

In a simplistic situation where surveys really sample exactly how people will really vote (nothing else happens before the election to change the mind of people, there is no issues with voter turnout differing for parties etc.), you could then for a known total size of the number of voters predict the outcome of voting using the beta-binomial distribution (the predictive distribution of the binomial distribution with a beta prior). In reality predicting an election is of course much more difficult.

Related Question