Solved – regression with circular response variable

circular statisticsrregression

I am analyzing some data where the variables of interests are circular (angles). I use R and the circular package.

In my dataset, every observation consists in a 2D Euclidean vector representing a movement on a plane (the length is constant, only the angle varies), together with a measure of a response to that movement, expressed as displacement in cartesian coordinates (i.e. the displacement along the x and y axis). I would like to fit a model where the initial movement (either the x-y cartesian representation or only the angle, since the length is constant) is the response variable. Ultimately, I would like to use the fitted model to estimate the initial movement vector in a related dataset where I only know the response to the movement.

My measurement of the response to the movement are affected by noise, independent over x and y dimensions, with possibly different variances and different additive biases.
To represent graphically the problem, what I want to do is estimate the direction of the original vectors (gray thick arrows in the figure below) starting from the noisy measurements (the thin black arrows).
enter image description here

I have tried fitting different models:

  1. one model with both predictor and dependent variable being circular
    (the angle of the initial movement vector, and the angle computed
    from the (x,y) displacement)
  2. one multivariate, linear model, with the cartesian (x,y) components of the initial movement as dependent variables, and the
    measured x,y displacement as linear predictors;
  3. one model with circular dependent variable, and linear predictor (this last one with no success)

First, I wasn't able to fit the 3rd model for some reason that I don't understand. I report here a reproducible example

require(circular)
theta <- circular(runif(500,0,2*pi),units="radians",type="angles",zero=0,rotation="counter")
rho <- rep(8,500)

# add measurement noise (different for x and y)
mX <- as.numeric(rnorm(500,0,2) + rho * cos(theta))
mY <- as.numeric(rnorm(500,-1,1) + rho * sin(theta))
linearPred <- cbind(mX,mY)

# fit model
mcl <- lm.circular(y=theta, x=linearPred,init=c(1,1), type="c-l",verbose=T)

Here is the output:

> mcl <- lm.circular(y=theta, x=linearPred,init=c(1,1), type="c-l",verbose=T)
Iteration  1 :    Log-Likelihood =  2.392981 
Iteration  2 :    Log-Likelihood =  1.082084 
Iteration  3 :    Log-Likelihood =  0.8013503 
Iteration  4 :    Log-Likelihood =  NA 
Error in while (diff > tol) { : 
  valore mancante dove รจ richiesto TRUE/FALSE

(the last line says: missing value where a TRUE/FALSE was required).
Can anyone shed light on this? I don't understand where this error comes from.

Second question, which model would you suggest to use among the first two? Here is the code that I used for the two models

# fit sin(theta) and cos(theta) with a multivariate linear model
mmv <- lm(cbind(sin(theta),cos(theta)) ~ mX+mY)

# "circular-circular" model
angularPred <- circular(atan2(mY,mX),units="radians",type="angles",zero=0,rotation="counter")
mcc <- lm.circular(y=theta, x=angularPred, type="c-c")

I can compute the angle from the multivariate fitted values as atan2(mmv$fitted[,1],mmv$fitted[,2]). Both seems to perform similarly in terms of mean angular error. To compare the two I computed the correlation between the predicted angles and the initial angle theta (Jammalamadaka – Sarma correlation coefficient), and the multivariate model seems to perform slightly better:

fitted.mmv <- circular(atan2(mmv$fitted[,1],mmv$fitted[,2]),units="radians",type="angles",zero=0,rotation="counter")
> cor.circular(fitted.mmv,theta) # multivariate
[1] 0.9851422
> cor.circular(mcc$fitted,theta) # "circular-circular"
[1] 0.7262862

However, the distribution of residuals of the multivariate model shows a strange pattern (figure below).
enter image description here

Is this pattern in the residuals a problem? Can anyone gives some advice on this? Which model would you use in this case? Is there any other possible approach that you would recommend?
Any advice is appreciated, thanks!

Best Answer

The pattern in the residuals is not necessarily a problem. One way to check this is to simulate a set of responses from the model that you just fitted (that is, under the assumption that the model is correct), fit a new model to the results, and plot its residuals. This gives you a measure of how weird you would expect the plot to look even if nothing were wrong.

If you can reliably pick out the original model from several plots produced in this way, then you should start worrying about the residual pattern.

Related Question