Solved – Estimating effects of a structural break on multiple/panel regressions coefficients and R implementation

interactionpanel datarregressionstructural-change

I would appreciate some methodology and R implementation help for a thing I'm working on.

I have daily observations of $Y$ for several countries over several years, I will use quite a few independent variables $X$.

My hypothesis is that there was a break at a specific time that has affected the determination process of $Y$ and altered the coefficients of a regression. I’m interested in knowing what the effects of $X$ on $Y$ were before and after the break and whether there was a significant change in these relationships after the break.

My idea is to use multiple regression for each country of the form:

$$Y = a + B_1 X_1 + B_2 X_2 + B_3 X_3 + D + B_4 D X_1 + B_5 D X_2 + B_6 D X_3 + e $$

where $D$ is a dummy variable equal to 1 after the suspected break.

My test for whether the coefficients are different after the break is then simply to test the significance of the coefficients on the interaction terms: $B_4, B_5, B_6$. I know how run this regression for individual countries.

Q1. Will this tell me what I’m looking for or should I use something else like a Wald or Chow test?

Q2. Is this called a natural experiment?

Q3. If I want to run this as a time fixed effects panel regression, how is this done in R?

The suspected break is the introduction date of a new financial regulation. It’s possible that there was a gradual change over a few months in anticipation of the regulation.

Q4. Could I cut out a few months of the data to remove the effects of a gradual change?

Q5. Should I use some method to look for a break before the suspected break date?

Bonus question: In R, the output of these two regressions are the exact same, I’m using the lm function:

$$\begin{array}{rcl} Y & = & a + BX + D + BXD \\Y & = & a + BX + BXD \end{array}$$

Both models return a coefficient specific to the dummy variable D, even though the second model only uses D in the interaction term, why is this and can prevent it?

Best Answer

There is a lot going on in the question. You'll get better responses if you narrow the focus of your question. Further, you'll better understand the question yourself.

Here is some suggests which will lead you in the right direction:

Theory:

Chow Test

You're going to want to do a simple chow test first around the suspected break data. This is a good start, however, you're example suggests that you'd rather be agnostic about the exact break data, which is a good idea.

Endogenous Testing

The problem with your suggested approach is that the distribution of the test statistics is not going to follow a typical normal distribution. To begin to understand why this is, image you have 100 different potential breaks and you test each of the resulting 100 dummy variables, using a 5% significant level. By pure luck you're going to fail to reject that dummy variable is different from zero for 5 breaks on average.

So what do you do? Fortunately, some very smart people figured out the correct distribution of the test statistics. The testing procedure is as follows, roll a chow test over your data, compare each of these test statistics to critical values obtained by the aforementioned very smart people. Break date which corresponds to the maximum test statistics, which is statistically significant, is the most likely break date

Casual Inference

Imagine that you find a break date. Good! One way to estimate the effect of the break is to now run a model which includes a dummy variable which is zero before the date and 1 after. However, the question now is this event casual?

Let's look at a simple example, assume a country deploys some stimulus package on a given date and you want to measure the impact of the stimulus on GDP. Does this coefficient corresponding to the occurrence of the event measure a casual relationship? The answer is, maybe. But probably not. Presumably, the stimulus was dispensed because the country was doing poorly. Therefore, poor GDP could have caused the stimulus, not the other way. This is known as an endogenity problem. The stimulus package is not truly a natural experiment.

Implementation in R

To implement this in R, you're going to want to use the strucchange package. The documentation is pretty good, here is the vignette