Solved – Identifying structural breaks in regression with Chow test

chow-testrstructural-change

I have some problems in using (and finding) the Chow test for structural breaks in a regression analysis using R. I want to find out if there are some structural changes including another variable (represents 3 spatial subregions).

Namely, is the regression with the subregions better than the overall model. Therefore I need some statistical validation.

I hope my problem is clear, isn't it?

Kind regards
marco

Toy example in R:

library(mlbench) # dataset
data("BostonHousing")

# data preparation
BostonHousing$region <- ifelse(BostonHousing$medv <= 
                               quantile(BostonHousing$medv)[2], 1, 
                        ifelse(BostonHousing$medv <= 
                               quantile(BostonHousing$medv)[3], 2,
                        ifelse(BostonHousing$medv > 
                               quantile(BostonHousing$medv)[4], 3, 1)))

BostonHousing$region <- as.factor(BostonHousing$region)

# regression without any subregion 
reg1<- lm(medv ~ crim + indus + rm, data=BostonHousing)

summary(reg1)

# are there structural breaks using the factor "region" which
# indicates 3 spatial subregions
reg2<- lm(medv ~ crim + indus + rm + region, data=BostonHousing)

——- subsequent entry

I struggled with your suggested package "strucchange", not knowing how to use the "from" and "to" arguments correctly with my factor "region". Nevertheless, I found one hint to calculate it by hand (https://stat.ethz.ch/pipermail/r-help/2007-June/133540.html). This results in the following output, but now I am not sure if my interpetation is valid. The results from the example above below.

Does this mean that region 3 is significant different from region 1? Contrary, region 2 is not? Further, each parameter (eg region1:crim) represents the beta for each regime and the model for this region respectively? Finally, the ANOVA states that there is a signif. difference between these models and that the consideration of regimes leads to a better model?

Thank you for your advices!
Best Marco

fm0 <- lm(medv ~ crim + indus + rm, data=BostonHousing)
summary(fm0)
fm1 <- lm(medv  ~ region / (crim + indus + rm), data=BostonHousing)
summary(fm1)
anova(fm0, fm1)

Results:

Call:
lm(formula = medv ~ region/(crim + indus + rm), data = BostonHousing)

Residuals:
       Min         1Q     Median         3Q        Max 
-21.079383  -1.899551   0.005642   1.745593  23.588334 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)    12.40774    3.07656   4.033 6.38e-05 ***
region2         6.01111    7.25917   0.828 0.408030    
region3       -34.65903    4.95836  -6.990 8.95e-12 ***
region1:crim   -0.19758    0.02415  -8.182 2.39e-15 ***
region2:crim   -0.03883    0.11787  -0.329 0.741954    
region3:crim    0.78882    0.22454   3.513 0.000484 ***
region1:indus  -0.34420    0.04314  -7.978 1.04e-14 ***
region2:indus  -0.02127    0.06172  -0.345 0.730550    
region3:indus   0.33876    0.09244   3.665 0.000275 ***
region1:rm      1.85877    0.47409   3.921 0.000101 ***
region2:rm      0.20768    1.10873   0.187 0.851491    
region3:rm      7.78018    0.53402  14.569  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 4.008 on 494 degrees of freedom
Multiple R-squared: 0.8142,     Adjusted R-squared: 0.8101 
F-statistic: 196.8 on 11 and 494 DF,  p-value: < 2.2e-16

> anova(fm0, fm1)
Analysis of Variance Table

Model 1: medv ~ crim + indus + rm
Model 2: medv ~ region/(crim + indus + rm)
  Res.Df     RSS Df Sum of Sq     F    Pr(>F)    
1    502 18559.4                                 
2    494  7936.6  8     10623 82.65 < 2.2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Best Answer

The strucchange package contains Chow and F tests for structural changes in regression models. The package comes with a vignette which shows how to use the package.

Related Question