Solved – Does autocorrelation cause bias in the regression parameters in piecewise regression

autocorrelationregression

In simple linear regression problems, autocorrelated residuals are supposed not to result in biased estimates for the regression parameters. Can the same be said for piecewise regression?

Suppose I want to fit a continuous, piecewise linear function of a single variable. Let's say for example we have data on shipping cost and weight of shipment. The function is piecewise because as the weight increases, at some point an additional rail car is required. We want to find the breakpoints and the slopes of the individual pieces. The model is fit, and for whatever reason, the residuals are found to be serially correlated in time. Could the regression parameters be biased?

I have posted some data in a Google spreadsheet at this link: http://goo.gl/LrTv3

Suppose it is known that there are two breakpoints at (unknown) points x1 and x2. We want to fit the data to a model f(x) given by:

x < x1:      f(x) = a + m1*x
x1 < x < x2: f(x) = a + m1*x1 + m2*(x - x1)
x > x2:      f(x) = a + m1*x1 + m2*(x2 - x1) + m3*(x - x2)

I use the nlm function in R to find the unknown parameters x1, x2, m1, m2 and m3:

sqerr <- function(prm,y,x) {
  a <- prm[1]
  x1 <- prm[2]
  x2 <- prm[3]
  m1 <- prm[4]
  m2 <- prm[5]
  m3 <- prm[6]
  sqerr <- sum((y-(a+ifelse(x<x1,m1*x,
                m1*x1+ifelse(x<x2,m2*(x-x1),
                m2*(x2-x1)+m3*(x-x2)))))^2)
}
data <- read.table("data.txt",header=T)
ai <- 0.4; x1i <- 0.4; x2i <- 0.7; m1i <- 0.0; m2i <- 0.8; m3i <- 3
prm <- c(ai,x1i,x2i,m1i,m2i,m3i)
uu <- nlm(sqerr,prm,data$Y,data$X)

Then I plot the residuals vs. the lag-1 residuals:

y <- data$Y
    x <- data$X
a <- uu$est[1]
    x1 <- uu$est[2]
x2 <- uu$est[3]
    m1 <- uu$est[4]
m2 <- uu$est[5]
    m3 <- uu$est[6]
resid <- (y-(a+ifelse(x<x1,m1*x,m1*x1+ifelse(x<x2,m2*(x-x1),m2*(x2-x1)+m3*(x-x2)))))
plot(resid[1:149]~resid[2:150])

There is clearly some sequential correlation. So my question is, are the regression parameters biased because of this? I have an old paper by Kadiyala (A Transformation Used to Circumvent the Problem of Autocorrelation, Econometrica Vol. 36, No. 1, Jan. 1968) that states:

"It is well known (see Watson [7] and Watson and Hannan [8]) that simple least squares estimators, though unbiased (when the independent variables are "fixed variates"),are, in general inefficient in the presence of autocorrelation among the disturbances."

It seems that by "simple least squares" he means linear equations of the form y = a + bx (that is the example used in the paper). But I have seen other papers that seem to imply that the estimators (i.e., regression parameters) are unbiased no matter what type of model you have. I don't think it's true in general.

Best Answer

A regression parameter that is often forgotten is the variance of the residuals. This one will be biased if residuals are correlated. This means that p-values of whatever test you are performing have to be handled with great care.

Otherwise, if you fit a single line through something that is not linear (your case), you should observe auto-correlation of the residuals, but through the X variable, not through time. In that case the parameters are not biased, they are just wrong.

However you specifically mention that your residuals are auto-correlated in time, so you could perhaps add time as a variable in your model and check whether this decorrelates the residuals.