Solved – Linear correlation between two sets of data

correlationrtime series

I have various data sets that correspond to values (in percentage) at given time points (hours, up to 10 hours).

The total number of data sets of values at given time points is 8.

My question is actually very simple, I know that some of these data sets are better linearly correlated with time than others, I want to perform an operation using some software that would find which data set is best correlated with time, or which part of which data set is best correlated with time.

Here is an example:

Time: 0 1 2 3 4 5 6 7 8 9 10
set1: 0 10 20 30 40 50 60 70 80 90 100
Set2: 0 5 15 22 23 55 65 80 81 83 90
Set3: 0 10 20 30 40 50 60 77 81 95 99

Set1 has is 100% linearily correlated with time, Set 2 is less, Set3 has part of it (0 -> 60) that's 100% percent linearily correlated with time.

I would like to find a software that can perform such calculations (that is find which set or which part of which set is best linearly related to another data set or part of data set).

Currently I am doing it with excel (manually, using R²) or by printing the plots and using a ruler (which you might find funny ;)))

Best Answer

R would provide you some very useful functions for this. For example:

Load data

dataset<-data.frame(Time=c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10), Set1=c(0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100), Set2=c(0, 5, 15, 22, 23, 55, 65, 80, 81, 83, 90), Set3=c(0, 10, 20, 30, 40, 50, 60, 77, 81, 95, 99))

Inspect the relations using plots

par(new=TRUE)
plot(Set1 ~ Time, data=dataset, col="blue")
points(Set2 ~ Time, data=dataset, col="red")
points(Set3 ~ Time, data=dataset, col="green")

abline(lm(Set1 ~ Time, data=dataset), col="blue") 
abline(lm(Set2 ~ Time, data=dataset), col="red") 
abline(lm(Set3 ~ Time, data=dataset), col="green") 

Or simpler:

pairs(dataset)

Compute some correlation statistics (use ?cor.test to see other options for e.g., method)

cor.test(dataset$Set1, dataset$Time, method="pearson")

To obtain a whole correlation matrix:

cor(dataset, method="pearson")
Related Question