Incorporating a prior is one way to 'make up' for small samples. Another is to use a mixed model, with an intercept for the mean structure and a random intercept for each product. The estimate of the population mean plus the predicted random effect (BLUP) then offers a form of shrinkage, where values for products with less information are shrunk more toward the overall sample mean than those based on more information. This method is common in, for example, Small Area Estimation in survey sampling.
Edit: The R code might look like:
library(nlme)
f <- lme(score ~ 1, data = yourData, random = ~1|product)
p <- predict(f)
If you go this route the assumptions are:
- independent, normal errors with expected value 0 and constant variance for all observations
- normal random effects with expected value 0
Violations of these can generally be modeled, but of course with that comes added complexity...
Well, $X_1 = 1$ only when $X_2 = X_3 = X_4 = 1$ and is $0$ otherwise, therefore
$$E(X_1) = P(X_2 = 1, X_3 = 1, X_4 = 1)$$
As @leonbloy mentions, knowledge of the correlations and marginal success probabilities is not sufficient for calculating $E(X_1)$, but it can be written in terms of the conditional probabilities; using the definition of conditional probability,
$$ E(X_1) = P(X_2 = 1, X_3 = 1 | X_4 = 1) \cdot P(X_4 = 1) $$
and $P(X_2 = 1, X_3 = 1 | X_4 = 1)$ can be similarly decomposed as
$$ P(X_2 = 1 | X_3 = 1, X_4 = 1) \cdot P(X_3 = 1 | X_4 = 1) $$
implying
$$E(X_1) = P(X_2 = 1 | X_3 =1, X_4 = 1) \cdot P(X_3 =1 | X_4 = 1) \cdot P(X_4 = 1)$$
Explicit calculation of $E(X_1)$ will require more information about the joint distribution of $(X_2,X_3,X_4)$. The above expression makes sense intuitively - the probability that three dependent Bernoulli trials are successes is the probability that the first is a success, and the second one is a success given the first, and the third is a success given that the first two are. You could equivalently interchange the roles of $X_2, X_3, X_4$.
Best Answer
An extended comment:
Let $\mathcal G(a,b)$ be the gamma density with pdf $f(x)\propto e^{-ax}x^{b-1}\mathbf1_{x>0}$.
Consider independent random variables $X_1\sim\mathcal G(a,b)$ and $X_2\sim\mathcal G(a,c)$.
Then it can be shown that $X_1+X_2$ is independent of $\frac{X_1}{X_1+X_2}$.
In fact, $X_1+X_2\sim\mathcal G(a,b+c)$ and $\frac{X_1}{X_1+X_2}\sim\mathcal{Be}(b,c)$, the beta distribution of the first kind.
This is a standard relation between beta and gamma variables.
Now,
\begin{align}E(X_1)&=E\left(\frac{X_1}{X_1+X_2}\cdot X_1+X_2\right) \\&=E\left(\frac{X_1}{X_1+X_2}\right)E(X_1+X_2) \end{align}
Hence, $$E\left(\frac{X_1}{X_1+X_2}\right)=\frac{E(X_1)}{E(X_1)+E(X_2)}$$
While I do not know a general formula for the expectation you ask for, in some special cases like the above, we can use well-known relations between functions of the random variables under consideration and use their independence and proceed. Here, of course all the expectations exist which may not be the case in general.