Solved – Regression using circular variable (hour from 0~23) as predictor

circular statisticsregression

My question originally arises from reading this post
Use of circular predictors in linear regression.

Right now, I'm trying construct linear regression using
"Bike Sharing dataset" from
https://archive.ics.uci.edu/ml/datasets/bike+sharing+dataset
which basically tries to regression bike rental count on different variables

One of the predictor that I have question is on using "Hour" of when the rental occurred, which takes value from 0 to 23.
The original post suggests transforming the circular data (time of day) using sine function to maintain the circular characteristic.

I was trying to apply to same methodology to my situation to transform the Hour variable. However,transforming 0~23 using sin(π hour/180) lets 00:00 and 12:00 to have 0. But I think people will certainly display different behavior when renting bike at midnight(00:00) and afternoon(12:00)

In this case, is it better to just use 23 dummy variables to account for hour
or am I misunderstanding the concept of circular regression?

Best Answer

Circular regression most often would refer to regression with a circular outcome.

In this case, we have linear regression with a circular predictor. In that case, we would add both the sine and the cosine of the angle to the regression, so that we predict the outcome as $\hat{y} = \beta_1\cos(\pi * \text{hour} / 12) + \beta_2\sin(\pi * \text{hour} / 12).$ Adding both the sine and cosine naturally resolves the issue you mention. Note that here, different than you, I've assumed that you represent hour in hours rather than degrees.

For a more elaborate answer on how to do this and what it means, please see the answer to this SO question.