For example, for $p$ as the parameter to a binomial or bernoulli, or a Poisson, what would a flat prior $P$ be? What does it mean to be "flat" – does this refer to diffuse?
Solved – What are examples of “flat priors”
prior
Related Solutions
Simply put, a flat/non-informative prior is used when one has little/no knowledge about the data and hence it has the least effect on outcomes of your analysis (i.e. posterior inference).
Conjugate distributions are those whose prior and posterior distributions are the same, and the prior is called the conjugate prior. It is favoured for its algebraic conveniences, especially when the likelihood has a distribution in the form of exponential family (Gaussian, Beta, etc.). This is hugely beneficial when carrying posterior simulations using Gibbs sampling.
And finally imagine that a prior distribution is set on a parameter in your model, however you want to add an another level of complexity/uncertainty. You would then impose a prior distribution on the parameters of the aforementioned prior, hence the name hyper-prior.
I think Gelman's Bayesian Data Analysis is a great start for anyone who's interested in learning Bayesian statistics:)
The Jeffreys prior is invariant under reparametrization. For that reason, many Bayesians consider it to be a “non-informative prior”. (Hartigan showed that there is a whole space of such priors $J^\alpha H^\beta$ for $\alpha + \beta=1$ where $J$ is Jeffreys' prior and $H$ is Hartigan's asymptotically locally invariant prior. — Invariant Prior Distributions)
It is an often-repeated falsehood that the uniform prior is non-informative, but after an arbitrary transformation of your parameters, and a uniform prior on the new parameters means something completely different. If an arbitrary change of parametrization affects your prior, then your prior is clearly informative.
Using the Jeffreys is, by definition, equivalent to using a flat prior after applying the variance-stabilizing transformation.
From a mathematical standpoint, using the Jeffreys prior, and using a flat prior after applying the variance-stabilizing transformation are equivalent. From a human standpoint, the latter is probably nicer because the parameter space becomes "homogeneous" in the sense that differences are all the same in every direction no matter where you are in the parameter space.
Consider your Bernoulli example. Isn't a little bit weird that scoring 99% on a test is the same distance to 90% as 59% is to 50%? After your variance-stabilizing transformation the former pair are more separated, as they should be. It matches our intuition about actual distances in the space. (Mathematically, the variance-stabilizing transformation is making the curvature of the log-loss equal to the identity matrix.)
Best Answer
The term "flat" in reference to a prior generally means $f(\theta)\propto c$ over the support of $\theta$.
So a flat prior for $p$ in a Bernoulli would usually be interpreted to mean $U(0,1)$.
A flat prior for $\mu$ in a normal is an improper prior where $f(\mu)\propto c$ over the real line.
"Flat" is not necessarily synonymous with 'uninformative', nor does it have invariance to transformations of the parameter. For example, a flat prior on $\sigma$ in a normal effectively says that we think that $\sigma$ will be large, while a flat prior on $\log(\sigma)$ does not.
With flat priors, your conditional posterior will be proportional to the likelihood (possibly constrained to some interval/region if the prior was). (In this case MAP and ML will normally correspond, though if we're taking the flat prior over some region, it might change that.)