The short answer is "yes you can" - but you should compare the Maximum Likelihood Estimates (MLEs) of the "big model" with all co variates in either model fitted to both.
This is a "quasi-formal" way to get probability theory to answer your question
In the example, $Y_{1}$ and $Y_{2}$ are the same type of variables (fractions/percentages) so they are comparable. I will assume that you fit the same model to both. So we have two models:
$$M_{1}:Y_{1i}\sim Bin(n_{1i},p_{1i})$$
$$log\left(\frac{p_{1i}}{1-p_{1i}}\right)=\alpha_{1}+\beta_{1}X_{i}$$
$$M_{2}:Y_{2i}\sim Bin(n_{2i},p_{2i})$$
$$log\left(\frac{p_{2i}}{1-p_{2i}}\right)=\alpha_{2}+\beta_{2}X_{i}$$
So you have the hypothesis you want to assess:
$$H_{0}:\beta_{1}>\beta_{2}$$
And you have some data $\{Y_{1i},Y_{2i},X_{i}\}_{i=1}^{n}$, and some prior information (such as the use of logistic model). So you calculate the probability:
$$P=Pr(H_0|\{Y_{1i},Y_{2i},X_{i}\}_{i=1}^{n},I)$$
Now $H_0$ doesn't depend on the actual value of any of the regression parameters, so they must have be removed by marginalising.
$$P=\int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} Pr(H_0,\alpha_{1},\alpha_{2},\beta_{1},\beta_{2}|\{Y_{1i},Y_{2i},X_{i}\}_{i=1}^{n},I) d\alpha_{1}d\alpha_{2}d\beta_{1}d\beta_{2}$$
The hypothesis simply restricts the range of integration, so we have:
$$P=\int_{-\infty}^{\infty} \int_{\beta_{2}}^{\infty} \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} Pr(\alpha_{1},\alpha_{2},\beta_{1},\beta_{2}|\{Y_{1i},Y_{2i},X_{i}\}_{i=1}^{n},I) d\alpha_{1}d\alpha_{2}d\beta_{1}d\beta_{2}$$
Because the probability is conditional on the data, it will factor into the two separate posteriors for each model
$$Pr(\alpha_{1},\beta_{1}|\{Y_{1i},X_{i},Y_{2i}\}_{i=1}^{n},I)Pr(\alpha_{2},\beta_{2}|\{Y_{2i},X_{i},Y_{1i}\}_{i=1}^{n},I)$$
Now because there is no direct links between $Y_{1i}$ and $\alpha_{2},\beta_{2}$, only indirect links through $X_{i}$, which is known, it will drop out of the conditioning in the second posterior. same for $Y_{2i}$ in the first posterior.
From standard logistic regression theory, and assuming uniform prior probabilities, the posterior for the parameters is approximately bi-variate normal with mean equal to the MLEs, and variance equal to the information matrix, denoted by $V_{1}$ and $V_{2}$ - which do not depend on the parameters, only the MLEs. so you have straight-forward normal integrals with known variance matrix. $\alpha_{j}$ marginalises out with no contribution (as would any other "common variable") and we are left with the usual result (I can post the details of the derivation if you want, but its pretty "standard" stuff):
$$P=\Phi\left(\frac{\hat{\beta}_{2,MLE}-\hat{\beta}_{1,MLE}}{\sqrt{V_{1:\beta,\beta}+V_{2:\beta,\beta}}}\right)
$$
Where $\Phi()$ is just the standard normal CDF. This is the usual comparison of normal means test. But note that this approach requires the use of the same set of regression variables in each. In the multivariate case with many predictors, if you have different regression variables, the integrals will become effectively equal to the above test, but from the MLEs of the two betas from the "big model" which includes all covariates from both models.
Best Answer
I will mainly focus on your first three questions. The short answers are: (1) you need to compare the effect of the IV on the DV for each time period but (2) only comparing the magnitudes can lead to wrong conclusions, and (3) there are many ways of doing that but no consensus on which one is correct.
Below I describe why you cannot simply compare coefficient magnitudes and point you to some solutions that have been thought of so far.
According to Allison (1999), unlike OLS, logistic regression coefficients are affected by unobserved heterogeneity even when such heterogeneity is not related to the variable of interest.
When you fit a logistic regression like:
(1)$$ \ln\bigg(\frac{1}{1-p_i}\bigg) = \beta_{0} + \beta_{1}x_{1i} $$
You are in fact fitting an equation predicting the value of a latent variable $y^*$ that represents the underlying propensity of each observation to assume the value $1$ in the binary dependent variable, what happens if $y^*$ is above a certain threshold. The equation for that is (Williams, 2009):
(2)$$ y^* =\alpha_{0} + \alpha_{1}x_{1i} + \sigma \varepsilon $$
The term $\varepsilon$ is assumed to be independent from the other terms and to follow a logistic distribution – or a normal distribution in the case of probit and a log-logistic distribution in case of complementary log-log and a cauchy distribution in the case of cauchit.
According to Williams (2009), the $\alpha$ coefficients in equation 2 are related to the $\beta$ coefficients in equation 1 through:
(3)$$ \beta_{j} = \frac{\alpha_{j}}{\sigma}\;\;j=1,...,J. $$
In equations 2 and 3, $\sigma$ is the scaling factor of the unobserved variation, and we can see that the size of the estimated $\beta$ coefficients depends on $\sigma$, which is not observed. Based on that, Allison (1999), Williams (2009), and Mood (2009), among others, claim that you cannot naively compare coefficients between logistic models estimated for different groups, countries or periods.
This is because comparisons may yield incorrect conclusions if the unobserved variation differs between groups, countries or periods. Both comparisons using different models and using interaction terms within the same model suffer from this problem. Besides logit, this also applies to its cousins probit, clog-log, cauchit and, by extension, to discrete time hazard models estimated using these link functions. Ordered logit models are also affected by it.
Williams (2009) argues that the solution is to model the unobserved variation through a heterogeneous choice model (a.k.a., a location-scale model), and provides a Stata add on called
oglm
for that (Williams 2010). In R, heterogeneous choice models can be fit with thehetglm()
function of theglmx
package, which is available through CRAN. Both programs are very easy to use. Lastly, Williams (2009) mentions SPSS'sPLUM
routine for fitting these models, but I have never used it and cannot comment in how easy it is to use.However, there is at least one working paper out there showing that comparisons using heterogeneous choice models can be even more biased if the variance equation is misspecified or there is measurement error.
Mood (2010) lists other solutions that do not involve modelling the variance, but use comparisons of predicted probability changes.
Apparently it is an issue that is not settled and I often see papers in conferences of my field (Sociology) coming up with different solutions for it. I would advise you to look at what people in your field do and then decide how to deal with it.
References