In the book that I have it says that only recently some papers have begun to explore multivariate regression where one or more variables are circular. I have not checked them myself, but relevant sources seem to be:
Bhattacharya, S. and SenGupta, A. (2009). Bayesian analysis of semiparametric linear-circular models. Journal of Agricultural, Biological and Environmental Statistics, 14, 33-65.
Lund, U. (1999). Least circular distance regression for directional data. Journal of Applied Statistics, 26, 723-733
Lund, U. (2002). Tree-based regression or a circular response. Communications in Statistics - Theory and Methods, 31, 1549-1560.
Qin, X., Zhang, J.-S., and Yan, X.-D. (2011). A nonparametric circular-linear multivariate regression model with a rule of thumb bandwidth selector. Computers and Mathematics with Applications, 62, 3048-3055.
In case for a circular response you have only a single circular regressor (which I understand that is not the case for you, but maybe separate regressions would be of interest as well) there is a way to estimate the model. [1] recommend fitting general linear model
$$\cos(\Theta_j) = \gamma_0^c + \sum_{k=1}^m\left(\gamma_{ck}^c\cos(k\psi_j)+\gamma_{sk}^c\sin(k\psi_j)\right)+\varepsilon_{1j},$$
$$\sin(\Theta_j) = \gamma_0^s + \sum_{k=1}^m\left(\gamma_{ck}^s\cos(k\psi_j)+\gamma_{sk}^s\sin(k\psi_j)\right)+\varepsilon_{2j}.$$
The good thing is that this model can be estimated using the function lm.circular from the R library circular.
[1] Jammalamadaka, S. R. and SenGupta, A. (2001). Topics in Circular Statistics. World Scientific, Singapore.
Best Answer
Generally, tree related methods are quite robust to redundant features. Basically, worst-case scenario you will be increasing computing time, but prediction-wise, you'll be quite safe. The problem with GLM's etc. is basically that redundant features can cause overfitting, since the number of parameters increases with the number of features.
Indeed, for a decision tree, if say you duplicate a feature (worst possible case) then if the feature is selected to make a split, the other will never be used below the split (at least not for the same split), as it will never reduce impurity.
Similarly for correlated features, if one feature is chosen to make a split, the other will be less likely to be chosen as there are less chance it will reduce impurity if chosen.
So for gradient boosting you're in the exact same situation. In fact, adding the extra feature can never be cumbersome : when the number of trees increases, and if reasonable shrinkage is selected, eventually both features will be expressed as fully as possible. But they shouldn't "counteract" as it happens in parametric models.