R – How to Compare Statistical Significance of Differences Between Two Polynomial Regressions

group-differencesrregressionregression coefficientsstatistical significance

So first of all I did some research on this forum, and I know extremely similar questions have been asked but they usually haven't been answered properly or sometimes the answer are simply not detailed enough for me to understand. So this time my question is : I have two sets of data, on each, I do a polynomial regression like so :

Ratio<-(mydata2[,c(2)])
Time_in_days<-(mydata2[,c(1)])
fit3IRC <- lm( Ratio~(poly(Time_in_days,2)) )

The polynomial regressions plots are:

enter image description here

The coefficients are :

> as.vector(coef(fit3CN))
[1] -0.9751726 -4.0876782  0.6860041
> as.vector(coef(fit3IRC))
[1] -1.1446297 -5.4449486  0.5883757 

And now I want to know, if there is a way to use an R function to do a test that would tell me whether or not there is a statistical significance in the difference between the two polynomials regression knowing that the relevant interval of days is [1,100].

From what I understood I can not apply directly the anova test because the values come from two different sets of data nor the AIC, which is used to compare model/true data.

I tried to follow the instructions given by @Roland in the related question but I probably misunderstood something when looking at my results :

Here is what I did :

I combined both my datasets into one.

f is the variable factor that @Roland talked about. I put 1s for the first set and 0s for the other one.

y<-(mydata2[,c(2)])
x<-(mydata2[,c(1)])
f<-(mydata2[,c(3)])

plot(x,y, xlim=c(1,nrow(mydata2)),type='p')

fit3ANOVA <- lm( y~(poly(x,2)) )

fit3ANOVACN <- lm( y~f*(poly(x,2)) )

My data looks like this now :

enter image description here

The red one is fit3ANOVA which is still working but I have a problem with the blue one fit3ANOVACN the model has weird results. I don't know if the fit model is correct, I do not understand what @Roland meant exactly.

Considering @DeltaIV solution I suppose that in that case :
enter image description here
The models are significantly different even though they overlap. Am I right to assume so ?

Best Answer

#Create some example data
mydata1 <- subset(iris, Species == "setosa", select = c(Sepal.Length, Sepal.Width))
mydata2 <- subset(iris, Species == "virginica", select = c(Sepal.Length, Sepal.Width))

#add a grouping variable
mydata1$g <- "a"
mydata2$g <- "b"

#combine the datasets
mydata <- rbind(mydata1, mydata2)

#model without grouping variable
fit0 <- lm(Sepal.Width ~ poly(Sepal.Length, 2), data = mydata)

#model with grouping variable
fit1 <- lm(Sepal.Width ~ poly(Sepal.Length, 2) * g, data = mydata)

#compare models 
anova(fit0, fit1)
#Analysis of Variance Table
#
#Model 1: Sepal.Width ~ poly(Sepal.Length, 2)
#Model 2: Sepal.Width ~ poly(Sepal.Length, 2) * g
#  Res.Df     RSS Df Sum of Sq      F    Pr(>F)    
#1     97 16.4700                                  
#2     94  7.1143  3    9.3557 41.205 < 2.2e-16 ***
#  ---
#  Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

As you see, fit1 is significantly better than fit0, i.e. the effect of the grouping variable is significant. Since the grouping variable represents the respective datasets, the polynomial fits to the two datasets can be considered significantly different.