Solved – controlling for a highly correlated variable

controlling-for-a-variablecorrelation

In a paper looking at hand movements and time taken to complete a task it appears that the ratio of movements over time is constant (i.e., movement and time are highly correlated) yet they say in the paper they can control independently for both variable – how is this possible?

There is a strong relationship between the time taken and number of
hand movements made (Spearman coefficient 0.79, P <0.01). This has
been demonstrated with ICSAD before. Therefore, why not just time the
procedure with a stopwatch? This is answered when we apply partial
correlation coefficient tests. When controlling for time, the number
of movements made significantly compares with surgical experience and
global score (correlation coefficient −0.44 and 0.56, respectively, P
<0.01 for both). However, when controlling for movement, the time
taken had no such relationship with experience and global rating
(correlation coefficient −0.02, P = 0.9; 0.10, P = 0.8, respectively),
suggesting that operative speed is secondary to economy of hand
movement.

Datta V, Chang A, Mackay S, Darzi A. The relationship between motion analysis and surgical technical assessments. Am J Surg. 2002 Jul;184(1):70-3.

Best Answer

You're talking about multicollinearity (in the model inputs, e.g., hand movements and time). The problem does not impact the reliability of a model overall. We can still reliably interpret the coefficient and standard errors on our treatment variable. The negative side of multicollinearity is that we can no longer interpret the coefficient and standard error on the highly correlated control variables. But if we are being strict in conceiving of our regression model as a notional experiment, where we want to estimate the effect of one treatment (T) on one outcome (Y), considering the other variables (X) in our model as controls (and not as estimable quantities of causal interest), then regressing on highly correlated variables is fine.

Another fact that may be thinking about is that if two variables are perfectly multicollinear, then one will be dropped from any regression model that includes them both.

For more, see: See http://en.wikipedia.org/wiki/Multicollinearity

Related Question