Solved – Identifying structural breaks in regression with Chow test

chow-testrstructural-change

I have some problems in using (and finding) the Chow test for structural breaks in a regression analysis using R. I want to find out if there are some structural changes including another variable (represents 3 spatial subregions).

Namely, is the regression with the subregions better than the overall model. Therefore I need some statistical validation.

I hope my problem is clear, isn't it?

Kind regards
marco

Toy example in R:

library(mlbench) # dataset
data("BostonHousing")

# data preparation
BostonHousing$region <- ifelse(BostonHousing$medv <= 
                               quantile(BostonHousing$medv)[2], 1, 
                        ifelse(BostonHousing$medv <= 
                               quantile(BostonHousing$medv)[3], 2,
                        ifelse(BostonHousing$medv > 
                               quantile(BostonHousing$medv)[4], 3, 1)))

BostonHousing$region <- as.factor(BostonHousing$region)

# regression without any subregion 
reg1<- lm(medv ~ crim + indus + rm, data=BostonHousing)

summary(reg1)

# are there structural breaks using the factor "region" which
# indicates 3 spatial subregions
reg2<- lm(medv ~ crim + indus + rm + region, data=BostonHousing)

——- subsequent entry

I struggled with your suggested package "strucchange", not knowing how to use the "from" and "to" arguments correctly with my factor "region". Nevertheless, I found one hint to calculate it by hand (https://stat.ethz.ch/pipermail/r-help/2007-June/133540.html). This results in the following output, but now I am not sure if my interpetation is valid. The results from the example above below.

Does this mean that region 3 is significant different from region 1? Contrary, region 2 is not? Further, each parameter (eg region1:crim) represents the beta for each regime and the model for this region respectively? Finally, the ANOVA states that there is a signif. difference between these models and that the consideration of regimes leads to a better model?

Thank you for your advices!
Best Marco

fm0 <- lm(medv ~ crim + indus + rm, data=BostonHousing)
summary(fm0)
fm1 <- lm(medv  ~ region / (crim + indus + rm), data=BostonHousing)
summary(fm1)
anova(fm0, fm1)

Results:

Call:
lm(formula = medv ~ region/(crim + indus + rm), data = BostonHousing)

Residuals:
       Min         1Q     Median         3Q        Max 
-21.079383  -1.899551   0.005642   1.745593  23.588334 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)    12.40774    3.07656   4.033 6.38e-05 ***
region2         6.01111    7.25917   0.828 0.408030    
region3       -34.65903    4.95836  -6.990 8.95e-12 ***
region1:crim   -0.19758    0.02415  -8.182 2.39e-15 ***
region2:crim   -0.03883    0.11787  -0.329 0.741954    
region3:crim    0.78882    0.22454   3.513 0.000484 ***
region1:indus  -0.34420    0.04314  -7.978 1.04e-14 ***
region2:indus  -0.02127    0.06172  -0.345 0.730550    
region3:indus   0.33876    0.09244   3.665 0.000275 ***
region1:rm      1.85877    0.47409   3.921 0.000101 ***
region2:rm      0.20768    1.10873   0.187 0.851491    
region3:rm      7.78018    0.53402  14.569  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 4.008 on 494 degrees of freedom
Multiple R-squared: 0.8142,     Adjusted R-squared: 0.8101 
F-statistic: 196.8 on 11 and 494 DF,  p-value: < 2.2e-16

> anova(fm0, fm1)
Analysis of Variance Table

Model 1: medv ~ crim + indus + rm
Model 2: medv ~ region/(crim + indus + rm)
  Res.Df     RSS Df Sum of Sq     F    Pr(>F)    
1    502 18559.4                                 
2    494  7936.6  8     10623 82.65 < 2.2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Best Answer

The strucchange package contains Chow and F tests for structural changes in regression models. The package comes with a vignette which shows how to use the package.

Related Solutions

Solved – perform chow test on time series

(1) The Chow test is for a change in the coefficients of a regression model at a known time. If you don't know at which point in time the (hypothesized) structural change occurs, don't use a Chow test. A natural alternative is to use Andrew's sup$F$ test which formalizes your approach (conducting the Chow test for all possible timings) but appropriately adjusts the corresponding $p$-values. It rejects if the maximum of the $F$ (or Chow) statistics becomes to large. See vignette("strucchange-intro", package = "strucchange") for worked examples and more references. Also, citation("strucchange") gives you more pointers.

(2) The model Prize ~ Trend is surely not appropriate for your data. This would suggest that it is stationary around a deterministic linear trend. Even if you allow for structural breaks and relax it to a piecewise linear trend, you won't find a good model for the Prize time series. Probably it would make more sense to model the returns rather than the levels of this time series. But more context would be required for a better recommendation. I suggest you talk to your advisor and ask for more guidance and suitable references.

Solved – Different regression coefficients in R and Excel

The difference between coefficients is in the relation x versus y which is reversed in the one case.

Note that

in your R case the coefficient relates to 'suva'
and in your Excel case the coefficient relates to 'heather'.

see in the following code where R can get to both cases:

lm(suva ~ heather, data = as.data.frame(data))

Call:
lm(formula = suva ~ heather, data = as.data.frame(data))

Coefficients:
(Intercept)      heather  
      14.65       -13.60  

> lm(heather ~suva, data = as.data.frame(data))

Call:
lm(formula = heather ~ suva, data = as.data.frame(data))

Coefficients:
(Intercept)         suva  
    0.32524     -0.01276

rest of the code:

data <- c(
12.880545,   0.061156645, 0.15   , 0.525,   0,
7.098873327, 0.026878039, 0.2275,  0   ,0,
8.660688381, 0.04037841 , 0.425 ,  0.25 ,   0,
7.734546932, 0.021618446, 0.225 , 0.3875,  0,
16.70696048, 0.103626684, 0.15  ,  0.075,   0,
9.763315183, 0.013387158, 0.25  ,  0.075,   0,
12.91735434, 0.008076468, 0.22  ,  0.22 ,   0,
19.94153851, 0.150798057, 0.0375,  0.35 ,   0.225,
17.25115559, 0.052229596, 0.0625,  0.2625,  0.225,
15.38596941, 0.05429447 , 0.1125,  0.45 ,   0.025,
15.53714185, 0.05933884 , 0.1625,  0.525,   0.0625,
14.11551229, 0.064579437, 0.1875,  0.35 ,   0.1375,
14.88575569, 0.0189853  , 0.3375,  0.3, 0,
12.32229733, 0.043085602, 0.0875,  0.1375,  0,
17.23861185, 0.071705699, 0.15  ,  0.1375,  0,
11.50832463,     0.1125 , 0.0875,  0.075, 0,
14.4810484,  0.078476821, 0.0375,  0.125,   0.0625,
9.110262652, 0.077306938, 0.145 ,  0.35 ,   0.0125,
10.8571733,  0.02681341 , 0.0375,  0.525,   0,
9.589339421, 0.01892435 , 0.2275,  0  , 0,
7.260373588, 0.014538237, 0.425 ,  0.25 ,   0,
11.11099161, 0.022802578, 0.225 ,  0.3875 , 0,
10.81488848, 0.047587818, 0.15  ,  0.075  , 0,
8.224131957, 0.031126904, 0.25  ,  0.075  , 0,
8.818607863, 0.002855409, 0.22  ,  0.22   , 0,
11.53999863, 0.031465613, 0.0375,  0.35   , 0.225,
14.92784964, 0.069998663, 0.0625,  0.2625 , 0.225,
9.666480932, 0.02387741 , 0.1125,  0.45   , 0.025,
12.51000758, 0.016960259, 0.1625,  0.525  , 0.0625,
13.32611463, 0.033670382, 0.1875,  0.35   , 0.1375,
16.76535191, 0.029613698, 0.3375,  0.3 ,0,
11.24615281, 0.008440059, 0.0875,  0.1375,  0,
10.60564875, 0.003930792, 0.15  ,  0.1375,  0,
11.82909125, 0.036017582, 0.1125,  0.0875 , 0.075,
18.2337185,  0.143451512, 0.0375,  0.125  , 0.0625,
10.6226222,  0.020561242, 0.145 ,  0.35   , 0.0125
)
data <- matrix(data,36, byrow=1)
colnames(data) <- c("suva", "Std dev", "heather", "sedge",   "sphagnum")

Why then, is $R^2$ still the same?

There is a certain symmetry in the situation. The regression slope coefficient is (in simple linear regression) the correlation coefficient scaled by the variance of the $x$ and $y$ data.

$$\hat\beta_{y \sim x} = r_{xy} \frac{s_y}{s_x}$$

The regression model variance is then:

$$s_{mod} = \hat\beta_{y \sim x} s_x = r_{xy} s_y$$

and the ratio of model variance and variance of the data is:

$$R^2 = \left( \frac{s_{mod}}{s_y} \right)^2= r_{xy}^2$$

Best Answer

Related Solutions

Solved – perform chow test on time series

Solved – Different regression coefficients in R and Excel

Related Question