Solved – Pearson Correlations in time series between time and value

correlationtime series

I'm wondering if it make sense, or simply your opinion about this: I have a dataset with a value, and a time variable. let's suppose that the time variable is month, like time = c(1,2,3,4,5), and the value = c(2,3,5,2,4).

In this case:

time = c(1,2,3,4,5)
value = c(2,3,5,2,4)

In your opinion (if months increase more than 12 in the next years), is it correct and does it make sense to calculate the Pearson Correlation

cor(time,value)
[1] 0.3638034

between time and value to see if there is positive, negative or not correlation (in this case positive)?

I think that as a formula could works, but I do not know it's an error to force the month a qualitative ordinal variable, to months in number, a quantitative interval variable and use them to Correlation.

EDIT

I've thought this because:
I have a big quantity of"moving" small time series (add one incoming month, remove first month) long 6 months. I need to see,for each of these time series, if the trend is growing or decreasing without an inferential point of view (think about the small time series as is, not a sample of a stocastic process, I suppose).
I've thoght that the Correlation could help to see if there is a linear relationship between time and values, but reading all those answer, it does not seems the best way.

Best Answer

You seem have several questions here, so I'll make sure to separate them and consider them separately.

"I do not know it's an error to force the month a qualitative ordinal variable, to months in number, a quantitative interval variable and use them to Correlation."

It's not an error to treat months as a "quantitative interval variable" as you say because the time difference between adjacent measured months is the same and known (it's one month). Labeling them as "1,2,3,..." doesn't change that fact.

"is it correct and does it make sense to calculate the Pearson Correlation"

Correct depends on what you're trying to do with your data (the context).

The correlation between the time vector and the time series doesn't really make sense, in terms of quantifying relationships between variables. This is because with time-series data we expect statistical dependencies that make it hard to interpret the sample correlation as an estimator of a population correlation. Another issue is that correlation is a possible indicator of causal mechanisms, which doesn't make sense in your example (not that correlation proves a causal relationship, it may suggest it depending on the context).

That doesn't mean that interpretation is impossible. The correlation coefficient indicates the strength of a linear fit between two variables, both statistically and geometrically. Ignoring the statistical distributions aspect, a large value of the correlation suggests that a line could be drawn through the scatterplot of the bivariate data.

If you have a large correlation value between time and a time series, it could indicate a linear (mean) trend in the data. Investigating the mean trend supersedes interest in the correlation coefficient here.

Related Question