Logistic – How Does the Power of Logistic Regression and a t-Test Compare?

logisticstatistical-powert-test

Is the power of a logistic regression and a t-test equivalent? If so, they should be "data density equivalent" by which I mean that the same number of underlying observations yields the same power given a fixed alpha of .05. Consider two cases:

  1. [The parametric t-test]: 30 draws from a binomial observation are made and the resulting values are averaged. This is done 30 times for group A (which has a binomial Pr of .70 of occurring) and 30 times for group B (which has a binomial Pr of .75 of occurring). This yields 30 means per group that represent a summary of 1,800 draws from a binomial distribution. A 58df t-test is performed to compare the means.
  2. [The logistic regression]: A logistic regression is performed with a dummy coded slope representing group membership and each of the 1,800 draws.

My question has two parts:

  1. Given a set alpha of .05, will the power of these methodologies be the same or different? Why? How can I prove it?
  2. Is the answer to question 1 is sensitive to the sample sizes going into the t-test, sample size of each group in the t-test, underlying binomial probabilities, or some other factor? If so, how can I know (without simulation) that the power is indeed different and what sort of changes will produce what sort of changes in power? Alternatively, provide worked out R code that solves the issue using simulation.

Best Answer

If I have computed correctly, logistic regression asymptotically has the same power as the t-test. To see this, write down its log likelihood and compute the expectation of its Hessian at its global maximum (its negative estimates the variance-covariance matrix of the ML solution). Don't bother with the usual logistic parameterization: it's simpler just to parameterize it with the two probabilities in question. The details will depend on exactly how you test the significance of a logistic regression coefficient (there are several methods).

That these tests have similar powers should not be too surprising, because the chi-square theory for ML estimates is based on a normal approximation to the log likelihood, and the t-test is based on a normal approximation to the distributions of proportions. The crux of the matter is that both methods make the same estimates of the two proportions and both estimates have the same standard errors.


An actual analysis might be more convincing. Let's adopt some general terminology for the values in a given group (A or B):

  • $p$ is the probability of a 1.
  • $n$ is the size of each set of draws.
  • $m$ is the number of sets of draws.
  • $N = m n$ is the amount of data.
  • $k_{ij}$ (equal to $0$ or $1$) is the value of the $j^\text{th}$ result in the $i^\text{th}$ set of draws.
  • $k_i$ is the total number of ones in the $i^\text{th}$ set of draws.
  • $k$ is the total number of ones.

Logistic regression is essentially the ML estimator of $p$. Its logarithm is given by

$$\log(\mathbb{L}) = k \log(p) + (N-k) \log(1-p).$$

Its derivatives with respect to the parameter $p$ are

$$\frac{\partial \log(\mathbb{L})}{ \partial p} = \frac{k}{p} - \frac{N-k}{1-p} \text{ and}$$

$$-\frac{\partial^2 \log(\mathbb{L})}{\partial p^2} = \frac{k}{p^2} + \frac{N-k}{(1-p)^2}.$$

Setting the first to zero yields the ML estimate ${\hat{p} = k/N}$ and plugging that into the reciprocal of the second expression yields the variance $\hat{p}(1 - \hat{p})/N$, which is the square of the standard error.

The t statistic will be obtained from estimators based on the data grouped by sets of draws; namely, as the difference of the means (one from group A and the other from group B) divided by the standard error of that difference, which is obtained from the standard deviations of the means. Let's look at the mean and standard deviation for a given group, then. The mean equals $k/N$, which is identical to the ML estimator $\hat{p}$. The standard deviation in question is the standard deviation of the draw means; that is, it is the standard deviation of the set of $k_i/n$. Here is the crux of the matter, so let's explore some possibilities.

  1. Suppose the data aren't grouped into draws at all: that is, $n = 1$ and $m = N$. The $k_{i}$ are the draw means. Their sample variance equals $N/(N-1)$ times $\hat{p}(1 - \hat{p})$. From this it follows that the standard error is identical to the ML standard error apart from a factor of $\sqrt{N/(N-1)}$, which is essentially $1$ when $N = 1800$. Therefore--apart from this tiny difference--any tests based on logistic regression will be the same as a t-test and we will achieve essentially the same power.

  2. When the data are grouped, the (true) variance of the $k_i/n$ equals $p(1-p)/n$ because the statistics $k_i$ represent the sum of $n$ Bernoulli($p$) variables, each with variance $p(1-p)$. Therefore the expected standard error of the mean of $m$ of these values is the square root of $p(1-p)/n/m = p(1-p)/N$, just as before.

Number 2 indicates the power of the test should not vary appreciably with how the draws are apportioned (that is, with how $m$ and $n$ are varied subject to $m n = N$), apart perhaps from a fairly small effect from the adjustment in the sample variance (unless you were so foolish as to use extremely few sets of draws within each group).

Limited simulations to compare $p = 0.70$ to $p = 0.74$ (with 10,000 iterations apiece) involving $m = 900, n = 1$ (essentially logistic regression); $m = n = 30$; and $m = 2, n = 450$ (maximizing the sample variance adjustment) bear this out: the power (at $\alpha = 0.05$, one-sided) in the first two cases is 0.59 whereas in the third, where the adjustment factor makes a material change (there are now just two degrees of freedom instead of 1798 or 58), it drops to 0.36. Another test comparing $p = 0.50$ to $p = 0.52$ gives powers of 0.22, 0.21, and 0.15, respectively: again, we observe only a slight drop from no grouping into draws (=logistic regression) to grouping into 30 groups and a substantial drop down to just two groups.

The morals of this analysis are:

  1. You don't lose much when you partition your $N$ data values into a large number $m$ of relatively small groups of "draws".
  2. You can lose appreciable power using small numbers of groups ($m$ is small, $n$--the amount of data per group--is large).
  3. You're best off not grouping your $N$ data values into "draws" at all. Just analyze them as-is (using any reasonable test, including logistic regression and t-testing).
Related Question