Solved – Errors in Variables and Deming’s multivariate regression: Assumptions

mathematical-statisticsmeasurement errormultiple regressionmultivariate analysisregression

There has been extensive literature that puts forth a standard set of assumptions for the Ordinary Least Squares (OLS) estimator. I am very interested in working around the two classical problems of

a)Errors in Variables regression,

b)Deming's Regression

in a multivariate setting where the error's in measurement is only in the independent variables and not in the response. My problem is that I have not been able to find any literature that gives a standard set of assumptions for these two problems in a multivariate setting within the literature.

Can someone put forth the assumed models and the distributional/probabilistic/independence assumptions clearly and succinctly for one or both of these problems in a multivariate setting with a matrix notation?

A majority of my problem's/doubts are usually resolved once the assumptions are put forth clearly.

Best Answer

I am not aware of any case when there is only measurement error in the independent variables and not in the response, but rather error is in both. In this sense what I mean is we have observations $$(W_i,Y_i), i=1,...,n$$

where we have that $Y_i = g(X_i) + V_i$ and $W_i = X_i + U_i$. $V_i$ is the standard error that you get from regression modelling, and $U_i$ is your measurement error. There is a lot of literature in the standard univariate case, but if you understand the univariate case then the extension to the multivariate case should follow. Admittedly, there is not a lot literature covering the multivariate case, because difficulties presented in the 1 - dimensional case are usually difficult enough.

For starters, Fan (1991) explains that the difficulty in the measurement error problem stems from the errors $U_i$. The ability to come up with "good" estimators depends on these errors. In particular he describes the $U_i$'s as being either 'super smooth' or 'ordinary smooth'.

The model above is known as the Classical Errors in Variables (Additive) since we observe 'naturally' a mismeasured version of the true covariate and the error is additive. As I have just suggested, there are many forms of the errors in variables model, but I suspect you are talking about the classical one. The standard distributional assumptions are:

$V_i$ is independent of $X_i$. $X_i$ is independent of $U_i$. Either the density of $U_i$ is already known (and is either supersmooth or ordinary smooth), or there are repeated measurements of $X_i$ measured with error (namely $W_{ij} = X_i + U_{ij}$). The dependence of $Y_i$ on the predictors is non differential i.e $Y_i|X_i,W_i ~ Y_i|X_i$ that is, if you know what the true covariate is, then you don't need the measurement error form anymore.

As far as I know there aren't any standard assumptions used in the multivariate case.

Related Question