I would suggest the book Bayesian Data Analysis as a great source for answering this question (in particular chapter 6) and everything I am about to say. But one of the usual ways that Bayesians attack this problem is by using Posterior Predictive P-values (PPPs). Before I jump into how PPPs would solve this problem let me first define the following notation:
Let $y$ be the observed data and $\theta$ be the vector of parameters. We define $y^{\text{rep}}$ as the replicated data that could have been observed, or, to think predictively, as the data we would see tomorrow if the experiment that produced $y$ today were replicated with the same model and the same value of $\theta$ that produced the observed data.
Note, we will define the distribution of $y^{\text{rep}}$ given the current state of knowledge with the posterior predictive distribution
$$p(y^{\text{rep}}|y)=\int_\Theta p(y^{\text{rep}}|\theta)p(\theta|y)d\theta$$
Now, we can measure the discrepancy between the model and the data by defining test quantities, the aspects of the data we wish to check. A test quantity, or discrepancy measure, $T(y,\theta)$, is a scalar summary of parameters and data that is used as a standard when comparing data to predictive simulations. Test quantities play the role in Bayesian model checking that test statistics play in classical testing. We define the notation $T(y)$ for a test statistic, which is a test quantity that depends only on data; in the Bayesian context, we can generalize test statistics to allow dependence on the model parameters under their posterior distribution.
Classically, the p-value for the test statistic $T(y)$ is
$$p_C=\text{Pr}(T(y^{\text{rep}})\geq T(y)|\theta)$$
where the probability is taken over the distribution of $y^{\text{rep}}$ with $\theta$ fixed.
From a Bayesian perspective, lack of fit of the data with respect to the posterior predictive distribution can be measured by the tail-area probability, or p-value, of the test quantity, and computed using posterior simulations of $(\theta,y^{\text{rep}})$. In the Bayesian approach, test quantities can be functions of the unknown parameters as well as data because the test quantity is evaluated over draws from the posterior distribution of the unknown parameters.
Now, we can define the Bayesian p-value (PPPs) as the probability that the replicated data could be more extreme than the observed data, as measured by the test quantity:
$$p_B=\text{Pr}(T(y^{\text{rep}},\theta)\geq T(y,\theta)|y)$$
where the probability is taken over the posterior distribution of $\theta$ and the posterior predictive distribution of $y^{\text{rep}}$ (that is, the joint distribution, $p(\theta,y^{\text{rep}}|y)$):
$$p_B=\iint_\Theta I_{T(y^{\text{rep}},\theta)\geq T(y|\theta)}p(y^{\text{rep}}|\theta)p(\theta|y)dy^{\text{rep}}d\theta,$$
where $I$ is the indicator function. In practice though we usually compute the posterior predictive distribution using simulations.
If we already have, say, $L$ simulations from the posterior distribution of $\theta$, then we can just draw one $y^{\text{rep}}$ from the predictive distribution for each simulated $\theta$; we now have $L$ draws from the joint posterior distribution, $p(y^{\text{rep}},\theta|y)$. The posterior predictive check is the comparison between the realized test quantities $T(y,\theta^l)$ and the predictive test quantities $T(y^{\text{rep}l},\theta^l)$. The estimated p-value is just the proportion of these $L$ simulations for which the test quantity equals or exceeds its realized value; that is, for which $$T(y^{\text{rep}l},\theta^l)\geq T(y,\theta^l)$$ for $l=1,...,L$.
In contrast to the classical approach, Bayesian model checking does not require special methods to handle "nuisance parameters." By using posterior simulations, we implicitly average over all the parameters in the model.
An additional source, Andrew Gelman also has a very nice paper on PPP's here:
http://www.stat.columbia.edu/~gelman/research/unpublished/ppc_understand2.pdf
Best Answer
If I understood your question correctly, you just need to fit data to distribution. In this case, you could use one of functions in R packages, such as
fitdistr
fromMASS
package, which uses maximum likelihood estimation (MLE) and supports discrete distributions, including binomial and Poisson.Then, as a second step, you would need to perform one (or more) of goodness-of-fit (GoF) tests to validate results. Kolmogorov-Smirnov, Anderson-Darling and (AFAIK) Lilliefors tests all are not applicable to discrete distributions. However, fortunately, chi-square GoF test is applicable to both continuous and discrete distributions and in R is a matter of calling
stats::chisq.test()
function.Alternatively, as your data represents a discrete distribution, you can use
vcd
package and its functiongoodfit()
. This function can be used either as a replacement for standard GoF testchisq.test()
, or, even better, as a full workflow (distribution fitting and GoF testing). For the full workflow option, just use default setup and do not specify parameterspar
(you can specifysize
, iftype = "nbinomial"
). The parameters will be estimated, using maximum likelihood or minimum chi-square (you can select the method). Results can be obtained by callingsummary()
function.