I have an existing fitted logit regression model.
Model:
$\hat{p}(x)=\frac{1}{1+e^{-\hat{\beta}x}}$
With parameter estimates $\hat{\beta}$, and observation $x$
Given a new set of datapoints $x_1,\ldots,x_n$ and $o_1,\ldots,o_n$, I want to construct a confidence interval for $E=\sum_i \hat{p}(x_i)$. To goal is to check if $O=\sum_i o_i$ is (not) significantly different.
For a single observation, you can construct a confidence interval for the probability quite easily. Given $\hat{\beta}$ is normally distributed, and the confidence interval for these estimates is ($\hat{\beta}_{0.025}, \hat{\beta}_{0.975}$), the confidence interval for the predictor is: ($\frac{1}{1 + e^{-\hat{\beta}_{0.025}x}}$, $\frac{1}{1 + e^{-\hat{\beta}_{0.975}x}}$).
Is there a good way to get the confidence interval for the "sum" of the predicted probabilities?
Best Answer
One possible solution is to use a boot strapping approach, given the new set of data points, to construct a boot strap estimate and confidence interval for $\sum_i\hat p(x_i)$
An alternative method would be to take a Bayesian approach and recaculate $\sum_i\hat p(x_i)$ for every sample of $\beta$ at every step of an MCMC type algorithm. Then at the end of our MCMC we would have a sample of $\sum_i\hat p(x_i)^{(j)}$ for every $j^{th}$ step of the MCMC that we could take the quantiles of in order to obtain our 95% confidence interval for $\sum_i\hat p(x_i)$ .
Using Scortchi's Suggestion here is the revised code:
Now interestingly, the confidence interval from using Scortchi's suggestion results in
where as using my original code we obtain the following:
So there is clearly a difference between the two methods.