Solved – Comparing logistic coefficients on models with different dependent variables

logisticregression

This is a follow up question from the one I asked a couple of days ago. I feel it puts a different slant on the issue, so listed a new question.

The question is: can I compare the magnitude of coefficients across models with different dependent variables? For example, on a single sample say I want to know whether the economy is a stronger predictor of votes in the House of Representatives or for President. In this case, my two dependent variables would be the vote in the House (coded 1 for Democrat and 0 for Republican) and vote for President (1 for Democrat and 0 for Republican) and my independent variable is the economy. I'd expect a statistically significant result in both offices, but how do I assess whether it has a 'bigger' effect in one more than the other? This might not be a particularly interesting example, but i'm curious about whether there is a way to compare. I know one can't just look at the 'size' of the coefficient. So, is comparing coefficients on models with different dependent variables possible? And, if so, how can it be done?

If any of this doesn't make sense, let me know. All advice and comments are appreciated.

Best Answer

The short answer is "yes you can" - but you should compare the Maximum Likelihood Estimates (MLEs) of the "big model" with all co variates in either model fitted to both.

This is a "quasi-formal" way to get probability theory to answer your question

In the example, $Y_{1}$ and $Y_{2}$ are the same type of variables (fractions/percentages) so they are comparable. I will assume that you fit the same model to both. So we have two models:

$$M_{1}:Y_{1i}\sim Bin(n_{1i},p_{1i})$$ $$log\left(\frac{p_{1i}}{1-p_{1i}}\right)=\alpha_{1}+\beta_{1}X_{i}$$ $$M_{2}:Y_{2i}\sim Bin(n_{2i},p_{2i})$$ $$log\left(\frac{p_{2i}}{1-p_{2i}}\right)=\alpha_{2}+\beta_{2}X_{i}$$

So you have the hypothesis you want to assess:

$$H_{0}:\beta_{1}>\beta_{2}$$

And you have some data $\{Y_{1i},Y_{2i},X_{i}\}_{i=1}^{n}$, and some prior information (such as the use of logistic model). So you calculate the probability:

$$P=Pr(H_0|\{Y_{1i},Y_{2i},X_{i}\}_{i=1}^{n},I)$$

Now $H_0$ doesn't depend on the actual value of any of the regression parameters, so they must have be removed by marginalising.

$$P=\int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} Pr(H_0,\alpha_{1},\alpha_{2},\beta_{1},\beta_{2}|\{Y_{1i},Y_{2i},X_{i}\}_{i=1}^{n},I) d\alpha_{1}d\alpha_{2}d\beta_{1}d\beta_{2}$$

The hypothesis simply restricts the range of integration, so we have:

$$P=\int_{-\infty}^{\infty} \int_{\beta_{2}}^{\infty} \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} Pr(\alpha_{1},\alpha_{2},\beta_{1},\beta_{2}|\{Y_{1i},Y_{2i},X_{i}\}_{i=1}^{n},I) d\alpha_{1}d\alpha_{2}d\beta_{1}d\beta_{2}$$

Because the probability is conditional on the data, it will factor into the two separate posteriors for each model

$$Pr(\alpha_{1},\beta_{1}|\{Y_{1i},X_{i},Y_{2i}\}_{i=1}^{n},I)Pr(\alpha_{2},\beta_{2}|\{Y_{2i},X_{i},Y_{1i}\}_{i=1}^{n},I)$$

Now because there is no direct links between $Y_{1i}$ and $\alpha_{2},\beta_{2}$, only indirect links through $X_{i}$, which is known, it will drop out of the conditioning in the second posterior. same for $Y_{2i}$ in the first posterior.

From standard logistic regression theory, and assuming uniform prior probabilities, the posterior for the parameters is approximately bi-variate normal with mean equal to the MLEs, and variance equal to the information matrix, denoted by $V_{1}$ and $V_{2}$ - which do not depend on the parameters, only the MLEs. so you have straight-forward normal integrals with known variance matrix. $\alpha_{j}$ marginalises out with no contribution (as would any other "common variable") and we are left with the usual result (I can post the details of the derivation if you want, but its pretty "standard" stuff):

$$P=\Phi\left(\frac{\hat{\beta}_{2,MLE}-\hat{\beta}_{1,MLE}}{\sqrt{V_{1:\beta,\beta}+V_{2:\beta,\beta}}}\right) $$

Where $\Phi()$ is just the standard normal CDF. This is the usual comparison of normal means test. But note that this approach requires the use of the same set of regression variables in each. In the multivariate case with many predictors, if you have different regression variables, the integrals will become effectively equal to the above test, but from the MLEs of the two betas from the "big model" which includes all covariates from both models.