Solved – Regression slope that increases persistently as the sample size increases

regressionregression coefficientstime series

I found a peculiar feature in some data that I am analyzing and was wondering whether there was a technical term for this type of phenomenon and whether anyone has come across it before.

I am doing a ordinary least squares linear regression of one financial time series on another, both of which appear stationary (bond yields and dividend yields for a specific sector). I am not using shrinkage. If I run the regression over small samples of the data (e.g. rolling 1-year periods), the regression slope (the estimated coefficient to the independent variable) is quite low and the intercept (relatively high. However, the more the sample size increases, the higher the slope becomes and the lower the intercept (i.e. the regression line steepens). This increase in slope (and corresponding decrease in intercept) appears surprisingly monotonic for increases in sample size.

Let me illustrate what is going on more precisely. I am using weekly observations, across 14 years. If I group the data by year, and run the regression separately for each year, the highest year with the highest regression slope records a slope of 1.07. The slope for the entire data set is however much higher at 1.79. It seems to me like the relationship is much weaker over small data sets than over the entire data set, or in this case weaker in the long-term than in the short-term. I.e. the one variable influences the other more in the long-term than short-term.

A colleague of mine thinks that the problem can be framed in terms of signal processing, and has posted the following question: https://dsp.stackexchange.com/questions/25323/frequency-response-of-a-rolling-linear-regression. I was wondering whether there was a purely statistical answer, and would greatly appreciate any help. Specifically: 1) is there a technical term for this type of phenomenon, 2) is there a better way to detect it and test its significance and 3) are there any places in nature or elsewhere where this typically occurs?

Here is a sample of the data illustrating the issue:
bond yields and dividend yields

The thick grey line is the regression over all data; the colored lines/points are for selected individual years.

Best Answer

From the plot, it's not clear how these are "stationary" time series, as the mean values of both clearly change from year to year. Both these time series seem to have significant and correlated overall trends, in this case presumably due to underlying macroeconomic phenomena that affect both dividend and bond yields.

The high regression coefficients over long time periods represent the joint responses of both variables to those underlying phenomena, which account for much of the variance in either type of yield over those periods of time. Over shorter time periods where the macroeconomic influences might be relatively constant, you are tending more to examine possible (and presumably weaker) inter-relations between the 2 variables.

Proper analysis of time series typically starts with identifying and removing these overall trends and any seasonal (or similar) components before examining the intra-series and inter-series correlations. If you are going to be analyzing econometric time series, draw on the decades of literature and take advantage of well-vetted tools like those provided in R.

Related Question