Solved – use date and time in a linear model in R

rregression

I'm trying to make a model of copepod counts made once a day, at varying times every day, over a 1 year period and under seasonally varying oxygen concentrations. I'm basically trying to see if count values are best predicted by time of day, time of year, or oxygen. As oxygen and time of year are correlated, I may end up dropping one of these variables.

Anyways, I'm trying to run a regression in R and it works fine if only oxygen is included, but I think both date and time are being treated like factors instead of as numbers. It will give me a p value for every day in the year, but there is only one observation per day so I don't think that makes sense. The overall p-value at the end of the summary in R is also suspiciously high (0.75) when I try to run only oxygen with date as the predictor, as a know for certain that they co-vary.

Is it even a good idea to run a regression with dates and times?

Is this type of output (p values for every day and every time) to be expected?

Is there a certain format that would work? I currently have dates as "2010-Oct-18" and times as "13:37:17", for example.

Best Answer

I do not have enought reputation to comment so I'll post this as an answer. I suggest you convert it to a unique timestamp (seconds since Jan 1, 1970 for example). This will allow you to investigate correlations that are linear with time.

For periodic relations (time of day or time of year) you can just use the timestamp minus the timestamp from midnight the same day (for day) or minus timestamp from Midnight of Jan 1 from the same year (for year).

Related Question