Solved – Fixed Regressor Conspiracy and Connection to Exchangeability

In simple regression model regressors are treated as fixed rather than stochastic. Whoever picks the experimental values for the regressors, decides in which frequency to include each value. This can be equally weighted (i.e 10 samples of 20mg medicine, 10 samples of 30mg medicine, 10 samples of 0mg-placebo) or in some other way.

Once the model is built and coefficients are determined, when it comes to use the model, it will be illogical to run the model with values of different frequency. It will be even more questionable to use values that did not exist in the experiment in the first place.

Moreover assuming that $Xs$ are fixed entails the following distributions:

$Y_i \sim N(\beta x_i,\sigma)$

Now if we consider the random sequence

$Y_1,Y_2,…,Y_n$

It appears to me that it is not an exchangeable sequence without having a random $X$ to index them. Each $Y_i$ has a different mean. Hence the inference is not converging to something meaningful. $\hat\beta$ that is found is implicitly linked to the frequency of the values picked by the designer. Changing the frequencies changes the $\hat\beta$. The difference with respect to the random predictor case is that in the random case we acknowledge that there is a distribution, we don't get to mess with it during the estimation, and it will be maintained into the future for the sake of making predictions and having a comfort in terms of average prediction error. In the fixed case nothing can be said about the frequency of the future usage.

Another interesting aspect is that variability of $\hat\beta$ with respect to the design frequencies sneaks into cross-validation procedures. How can one safely split the design matrix into folds if we cannot assume convergence to the same $\hat\beta$ in the limit for each fold? We simply cannot sample from the design matrix if we consider it fixed. Same argument applies to various resampling procedures.

Moreover the fixed-X model doesn't actually contain a placeholder for future $x$ values, i.e. $Y_i \sim f_i(y_i)$. We are just lucky that fixed-X and random-X inference coincides under correct specification, so that we can abuse notation as if we have a placeholder.

For this reason I conclude, perhaps ignorantly, either there is no such thing as a fixed regressor (hence it is a scientific conspiracy), or there is a different way of looking at this to rationalize it.

EDIT

https://economics.mit.edu/files/11856

On section 3, there is a discussion of the situation. It seems the problem emerges under misspecification. But this is almost always true when we have any data set at hand. Nobody knows the truth.

A point that has not received attention in the literature is that
under general misspecification, the random versus fixed regressor
distinction has implications for inference that do not vanish with the
sample size.

……….

One way to frame the question is in terms of different repeated
sampling perspectives one can take. We can consider the distribution
of the least squares estimator over repeated samples where we redraw
the pairs Xi and Yi (the random regressor case), or we can consider
the distribution over repeated samples where we keep the values of Xi
fixed and only redraw the Yi (the fixed regressor case). Under general
misspecification both the mean and variance of these two distributions
will differ.

I would appreciate if somebody could make a connection to the exchangeability argument for an explanation, if there is a merit to it. How is it even working on correctly specified fixed-X situation, the basic proof is in every text book for $E[\hat\beta] = \beta$, but did we just get lucky?

Best Answer

A regression model gives predictions of the response conditional on predictor values; so there's no problem in applying a model fitted to one set of predictor values fixed by design to another set of predictor values, even if the latter are randomly sampled from a population. With an experimental design matrix $X$, the expectation & variance of the predicted response $\hat y$ for a (new) predictor vector $x$ are given by $$\operatorname{E}{\hat y \,|\, x} = x^\mathrm{T}\beta$$ $$\operatorname{Var}{\hat y\,|\,x}=\sigma^2\left(1+x^\mathrm{T}(X^\mathrm{T}X)^{-1}x\right)$$ where $\beta$ is the coefficient vector & $\sigma^2$ is the error variance—so the particular predictor values used for the fit don't affect the expectation of predictions, but do affect the variation in their precision throughout predictor space. Note that any aggregate fit metrics, say root mean square error of predictions, don't carry over from the experiment to the new sample.
The above discussion assumes the model is right: in practice there will be extra-statistical considerations when applying it. You need to think about e.g. variation of effects that weren't investigated in the original experiment, the reliability of extrapolation into new regions of predictor space, selection bias in the population, & whether experimental manipulation is comparable to a natural cause. An engineer might model resistivity as a linear function of temperature from experimental data & be confident in applying the model to a particular collection of resistors used in a circuit board. The medical researcher in your example might assert that the medicine reduces blood cholesterol level, & confidently predict the results of further experiments; but would be unlikely to claim that, in a random sample from, say, all hospital admissions, those patients taking the medicine would have lower cholesterol levels than those who weren't.

Best Answer

Related Solutions

Related Question