I understand that the Beta Distribution is a 'natural conjugate' of the Binomial distribution, in sense that the Posterior Distribution is proportional to the multiplication of both.
$$ Posterior(\theta | X) \propto Likelihood(X|\theta) \cdot Prior(\theta) $$
$$ \pi(\theta | X) \propto P(X=x | \theta) \cdot \pi(\theta) $$
$$ \pi(\theta | X) \propto \big{[} \binom{n}{x} \theta^x (1-\theta)^{n-x} \big{]} \cdot \big{[} \theta^{\alpha-1} (1-\theta)^{\beta-1} \big{]} $$
$$ \theta | X \sim \text{Beta}(x + \alpha, n – x + \beta) $$
with $x$ the number of successes in $n$ independent Bernoulli trials with success probability $\theta$. $\alpha$ and $\beta$ prior parameters of the beta distribution.
But I have also seen some people scaling up/down the impact of each unit of information into the posterior distribution:
$$ \theta | X \sim \text{Beta}(c \cdot x + \alpha, c \cdot (n – x) + \beta) $$
In that sense, the posterior distribution can adjust the strength of the evidence provided by the data by scaling the successes and failures (With a big $c$ increasing the effect of the data, small $c$ decreasing the effect of the data).
The fact that there is an (arbitrary?) $c$ scaling up and down the posterior distribution makes me think that the Beta distribution, even though convenient because of the conjugacy, may not be a good representation of the distribution of $\theta$, otherwise, why tuning it? Is there something I am missing/ misinterpreting?
Just for a visual representation:
let's define $\alpha = 1$, $\beta = 1$, $n = 16$, $x = 6$
if $c = 0.5$:
Best Answer
Why not tuning it?
It's intrinsic to the Bayesian approach to be subjective. There is no unique choice for a prior. The computation of the expression $P(\theta|x)$ requires a prior $P(\theta)$, and that prior will always be prior information outside of the experiment observations. Whether it stems from a tunable distribution family or from a fixed distribution.
Even when you have according to some standard a unique prior that can not be tuned, then it is a subjective choice to use that standard.
If anything, then tuning makes the prior more practical (and better). For example:
Starting with the Jeffreys prior, $\propto \theta^{1/2} (1-\theta)^{1/2}$, you obtain after an experiment a posterior $\propto \theta^{1/2+x} (1-\theta)^{1/2+n-x}$, and that posterior can serve as a prior for a new experiment. You don't have to use the Jeffreys prior all the time.
In the analysis of the covid vaccine, where a conservative prior was chosen (on with prior knowledge/believe opposite to what one wants to prove with the experiment) described in this question Which statistical model is being used in the Pfizer study design for vaccine efficacy?
This isn't exactly the tuning of the prior, and is more like an extension to the Bayesian analysis as a whole (diverging from it, it isn't the same anymore). The $c$ parameter is changing the likelihood function, and not the prior. This additional tuning paramter $c$ is a trick outside of the Bayesian framework.
What distribution/likelihood is this exactly? It needs to be a function of a form that can be partitioned like $$f(x|\theta) = g(x) h(x|\theta)$$ where $$h(x|\theta) = \theta^{cx}(1-\theta)^{c(n-x)}$$
We might see it as an exponential dispersion family. For the binomial distribution this has been described in another question: What is the dispersion parameter of binomial distribution?).
If we use that likelihood function, then the form of the dispersed binomial distribution becomes
$$f(x|\theta,c) = h(x,c) \exp\left(\frac{\theta x + A(\theta)}{1/c} \right)$$
with
$$\begin{array}{} h(x,c) &=& {n \choose x} \\ \theta &=& \log(p/(1-p)) \\ A(\theta) &=& n \log(1+\exp(\theta)) \end{array}$$
This is not a true distribution (and we can not correct it by normalizing it, which would change $A(\theta)$, and the likelihood).
The likelihood function with that parameter $c$ is a quasi likelihood function. The parameter $c$ can be considered as a pragmatic approach to tuning the dispersion of the binomial distribution.