Solved – Multiple regression in directional / circular statistics

circular statisticsmultiple regressionregression

I'm trying to develop a predictive model for an angular dependent variable (on $[0,2\pi])$ using several independent measurements – also angular variables, on $[0,2\pi]$ – as predictors. Each predictor is significantly but not extremely strongly correlated with the dependent variable. How can I combine the predictors to determine a predictive model for the dependent variable that is optimal in some sense? And how can I rigorously identify the strongest predictor(s)?

For variables on Euclidean space(s), I'd employ multiple regression (or similar) and principal components analysis. But the periodicity of all variables mucks with these approaches, e.g., 0.02 must be highly correlated with 6.26, but not with 3.14. How are "the usual" procedures generalized to directional / circular statistics? Any insights or cites to useful references would be useful. (I'm already aware of the texts by N. Fisher and Mardia & Jupp, but don't have handy access to these.)

Best Answer

In the book that I have it says that only recently some papers have begun to explore multivariate regression where one or more variables are circular. I have not checked them myself, but relevant sources seem to be:

Bhattacharya, S. and SenGupta, A. (2009). Bayesian analysis of semiparametric linear-circular models. Journal of Agricultural, Biological and Environmental Statistics, 14, 33-65.

Lund, U. (1999). Least circular distance regression for directional data. Journal of Applied Statistics, 26, 723-733

Lund, U. (2002). Tree-based regression or a circular response. Communications in Statistics - Theory and Methods, 31, 1549-1560.

Qin, X., Zhang, J.-S., and Yan, X.-D. (2011). A nonparametric circular-linear multivariate regression model with a rule of thumb bandwidth selector. Computers and Mathematics with Applications, 62, 3048-3055.


In case for a circular response you have only a single circular regressor (which I understand that is not the case for you, but maybe separate regressions would be of interest as well) there is a way to estimate the model. [1] recommend fitting general linear model

$$\cos(\Theta_j) = \gamma_0^c + \sum_{k=1}^m\left(\gamma_{ck}^c\cos(k\psi_j)+\gamma_{sk}^c\sin(k\psi_j)\right)+\varepsilon_{1j},$$ $$\sin(\Theta_j) = \gamma_0^s + \sum_{k=1}^m\left(\gamma_{ck}^s\cos(k\psi_j)+\gamma_{sk}^s\sin(k\psi_j)\right)+\varepsilon_{2j}.$$

The good thing is that this model can be estimated using the function lm.circular from the R library circular.

[1] Jammalamadaka, S. R. and SenGupta, A. (2001). Topics in Circular Statistics. World Scientific, Singapore.