Circular Statistics – Regression and Correlation of Wind Direction Data

circular statistics

I'm currently facing a problem which I thought simpler in the beginning. Basically I have outputs of wind direction from a model and observed wind direction from my sensors (both in degrees). What I would like to do is the equivalent of linear regression and correlation coefficient computation that is used when comparing linear data.

I have both MATLAB and R available and here's something that I've attempted:

I am aware of the existence of the Circular Statistics Toolbox for MATLAB, but, while there are functions for correlations there's nothing for regression.
I've tried to use the package circular for R, but, for both regression and correlation data needs to be "circular objects" and I do not know how to correctly set the function as.circular for wind direction data: I understand that the unit flag must be set to degree and the rotation to "clock"…but I'm not entirely sure about the zero and modulo flag since the function reasons in radians. Would converting the degrees to radians work? While their modulo will became $2\pi$ what about the zero flag?

Best Answer

You may use the circular package. It's probably best to convert your data to radians by multiplying by $\pi/180$, or using xcirc <- rad(x).

The type, unit, zero, modulo flags are mostly there for plotting and storing information about a dataset. For your analysis, they don't matter.

Here is some example analysis, adapted from example(lm.circular)

# Example data
x <- runif(n, 0, 2*pi)
y <- atan2(0.15*cos(x) + 0.25*sin(x), 0.35*sin(x)) + rnorm(n, 0, 1)

# Compute the model, get a circular correlation.
circ.lm <- lm.circular(y, x, order=1, type = "c-c")
cor.circular(y=y, x=x)

# Plot the results.
plot.default(x, y)
circ.lm$fitted[circ.lm$fitted>pi] <- circ.lm$fitted[circ.lm$fitted>pi] - 2*pi 
points.default(x[order(x)], circ.lm$fitted[order(x)], type='l')

Related Solutions

Linear Regression – Predicting Magnitude from Angle in Linear Regression Models

Here, we want to predict a linear dependent variable from circular independent variables. There are several ways to approach this. The main thing to check is whether the relation between your dependent variable (let's say $Y$) and the circular predictor (say $\theta$) has a sinusoidal shape. This is often the case, but not necessarily. Below is an example of data of this shape.

th  <- rnorm(100, 1, 4) %% (2*pi)
err <- rnorm(100, mean = 0, sd = 0.8)
icp <- 10

bc <- 2
bs <- 3

y   <- icp + bc * cos(th) + bs * sin(th) + err

plot(th, y)

$Sinusoidal relationship between $\theta$ and $Y$.$

If the data does have this shape, roughly, a good simple model for the data is then given by splitting the circular predictor $\theta$ up in a sine and a cosine component, and running a regular linear regression on these two components, in this case by:

lm(y ~ cos(th) + sin(th))

>Call:
>lm(formula = y ~ cos(th) + sin(th))
>
>Coefficients:
>(Intercept)      cos(th)      sin(th)  
>      10.12         2.04         2.95

Of course, this can be done for multiple predictors as well. A good introduction on this may be found in Pewsey, Neuhauser & Ruxton (2013), Circular Statistics in R.

As mentioned before, we may add terms as in a Fourier regression, but this can only be recommended if the relationship structurally exhibits very different forms, because higher-order Fourier regression introduces, IIRC, a large number of difficult to interpret parameters.

Solved – Interpreting circular-linear regression coefficient

See the documentation:

help(lm.circular)

"If type=="c-l" or lm.circular.cl is called directly, this function implements the homoscedastic version of the maximum likelihood regression model proposed by Fisher and Lee (1992). The model assumes that a circular response variable theta has a von Mises distribution with concentration parameter kappa, and mean direction related to a vector of linear predictor variables according to the relationship: mu + 2*atan(beta'*x), where mu and beta are unknown parameters, beta being a vector of regression coefficients. The function uses Green's (1984) iteratively reweighted least squares algorithm to perform the maximum likelihood estimation of kappa, mu, and beta. Standard errors of the estimates of kappa, mu, and beta are estimated via large-sample asymptotic variances using the information matrix. An estimated circular standard error of the estimate of mu is then obtained according to Fisher and Lewis (1983, Example 1)."

Thus you should compare with a different model

> nls(y~a+2*atan(b*x),start=c(a=0.06337,b=0.022344),data=list(x=x,y=y))
Nonlinear regression model
  model: y ~ a + 2 * atan(b * x)
   data: list(x = x, y = y)
      a       b 
0.07112 0.02231 
 residual sum-of-squares: 12.36

Number of iterations to convergence: 12 
Achieved convergence tolerance: 5.838e-06

this 'nls' function does not use the same underlying distributions for the residual terms but does provide similar coefficients.

Clearly you made your posted problem very simplified in order to make it easier to be understood.

Could you add your real case? (to spice up the question)

Best Answer

Related Solutions

Linear Regression – Predicting Magnitude from Angle in Linear Regression Models

Solved – Interpreting circular-linear regression coefficient

Related Question