If the number of variables is sufficiently large and the correlation is bounded away from 1, then there are Central Limit Theorems that apply (e.g., also see versions of the CLT for stationary processes).
So if your $i$-th variable has parameter $p_i$, then the variance of $X_i$ is
$p_i(1-p_i)$ and :
$$\text{Cov}(X_i,X_j)=\rho \sqrt{p_i(1-p_i)\cdot p_j(1-p_j)}$$
The expected value of the sum is the sum of the expected values. The variance of the sum is the sum of the variances plus twice the sum of all the pairwise covariances, and if $n$ is large enough, the standardized sum will be approximately standard normal.
You may want to consider the possibility of using a continuity correction.
Short summary: if the $p_i$s are independent, it's the binomial. If the $p_i$s are all equal, it's the beta-binomial.
By $X_i \sim \textrm{Bernoulli}(p_i)$, you must mean the conditional distribution $X_i\mid p_i \sim \textrm{Bernoulli}(p_i)$. The marginal distribution of $X_i$ (that is, the distribution obtained by averaging over different values of $p_i$s) is obtained, e.g., by noting that $X_i$ is Bernoulli, and computing the expectation using the tower law:
\begin{equation}
\mathbb{E}(X_i) = \mathbb{E}(\mathbb{E}(X_i \mid p_i)) = \mathbb{E}(p_i) = \frac{\alpha}{\alpha+\beta}.
\end{equation}
So, $X_i \sim \textrm{Bernoulli}(\frac{\alpha}{\alpha+\beta})$, for all $i$. However, this does not yet determine the distribution of $Z$ as you have not specified enough information to deduce the joint distribution of $X_i$s. Two additional things are needed:
- The conditional distribution of $X_i$s given the $p_i$s, $p(X_1,\ldots,X_n \mid p_1,\ldots,p_n)$.
- The joint distribution of the $p_i$s: $p(p_1,\ldots,p_n)$.
For the first part, I guess&assume you intend the $X$s to be conditionally independent given the $p$s. The difference of the two distributions mentioned in your question stems from the second point.
If we assume the $p$s to be mutually independent, then the $X$s will be mutually independent, too, as each $X$ depends on only one $p$ and the $X$s are conditionally independent given the $p$s. Then, $Z$ is just the sum of iid. Bernoulli random variables. But this is the definition of the binomial distribution, and thus indeed
\begin{equation}
Z \sim \textrm{Binomial}\left(n,\frac{\alpha}{\alpha+\beta}\right).
\end{equation}
Note that in this case it did not make much sense to define the $p$s in the first place: if each $p_i$ influences only the corresponding $X_i$, nothing is learned about the distribution of $p_i$ except via the value of $X_i$. Thus, the $p_i$s are useless in the sense that the exactly same model would have been easier to specify by just stating that $X_i$s are independent Bernoulli random variables with parameter $\alpha/(\alpha+\beta)$.
The beta-binomial distribution is actually defined so that there is only one $p$ parameter drawn from the beta distribution, and then all $X_i$s are Bernoulli with this $p$. This is obtained as a special case of your question by stating in my "step 2" that the joint distribution of the $p_i$s are all the same.
By defining some other dependence structures for the $p_i$s (not independent, but not constrained to be equal either), other distributions for $Z$ would be obtained, but I don't know if any of these have special names.
Best Answer
Have you seen this paper: Kadane, 2016, Sums of Possibly Associated Bernoulli Variables: The Conway-Maxwell-Binomial Distribution?
In this paper, you can see that the conditions assumed in your question i.e. having $n$ marginally Binomial r.v. with the same probability of success, $p$, and the same pairwise correlation, $\rho$, between all pairs does not fully specify the distribution of the sum of those random variables.
To be more specific, in Section 2.3 of the paper, the author has assumed "zero higher order additive interaction (Darroch, 1974)":
where $P\{W = k\} = P\{\sum_{i=0}^{m} X_i = k\}$. The model is also called correlated binomial model.
Here is also a brief summary of the first sections of the paper that you may find helpful for modeling the sum:
Proposition 1,2 and 3 provide reasoning for not using correlation as a measure of dependence and to model the sum without assuming a marginal distribution. section 2.1 and 2.2 are distribution models that have these two characteristics. They have some notion of dependence but it is not necessary the correlation. They also allow for symmetric dependence. (Proposition 1 states that correlation cannot be used as a measure of dependence as it is bounded below by $-1/(m-1)$ based on the conditions stated in the proposition).
Section 3 is the proposed model of the author to directly model the sums using a notion of dependence that allows both for positive and negative association.