Solved – Estimating the CAPM Beta via OLS Regresson

financeregression

I am studying econometrics from the third edition of 'Introduction to Econometrics' by James H. Stock and Mark W. Watson.

On page 166 it digresses into the beta of the stock. It says

Those betas typically are estimated by OLS regression of the actual excess return on the stock against the actual excess return on a broad market index.

My understanding by the language is that the beta of the stock is the coefficient of the regressor, which is the market index's excess return. That is:

$$(R – R_{f}) = \beta_{0} + \beta(R_{m}-R_{f})+u.$$

Thus to estimate the return of a stock

$$\hat{R}-R_{f}=\hat{\beta_{0}}+\hat{\beta}(R_{m}-R_{f}).$$

However for some odd reason when I do homework problems it uses the following equation to estimate returns:

$$\hat{R}-R_{f}=\hat{\beta}(R_{m}-R_{f}).$$

That is it imposes $\beta_{0} = 0$.

I have no doubt my understanding is incorrect. Any assistance would be greatly appreciated.

Best Answer

As Stephen mentions, the confusion is between: (1) the CAPM vs. (2) the market model.

Let $R^f$ denote the risk free rate. We often work with excess returns, which involves subtracting of the risk free rate.

Some simple models for expected returns

``Market model" $$ R_t - R^f = \alpha + \beta\left(R^m_t - R^f \right) + \epsilon_t $$ $$ E\left[ R_t \right] - R^f = \alpha + \beta\left(E[R^m_t] - R^f \right) $$ The market model is a simple, statistical model and can be justified by assuming that the joint distribution of monthly stock returns is multivariate normal.
Capital Asset Pricing Model (CAPM) $$ E\left[ R_t\right] - R^f = \beta\left(E[R^m_t] - R^f \right) $$ The CAPM is an economic theory that expected excess returns of a stock are linear in the excess return of the market, that $\alpha = 0$ from the market model regression.

Be aware that the CAPM doesn't work. It's all over MBA corporate finance, but asset pricing people find it useless. Something less crazy to use would be the Fama-French 3 Factor Model.

Example of how to use the CAPM (or any of these factor asset pricing models).

Compute excess returns: $ R_{i,t} - R^f_t$
Regress excess returns on excess returns of the market and a constant (i.e. run the market model regression). $$ R_{i,t} - R^f_t = \alpha_i + \beta_i \left( R^m_t - R^f_t \right) + \epsilon_{i,t}$$
Ignore the estimated $\hat{\alpha}$.
Your estimated expected excess return according to the CAPM is $\hat{\beta_i} E[R^m_t - R^f_t] $.

Basic summary:

You want to run the time series regressions $R_{it} - R^f_t = \alpha_i + \beta_i \left( R^m_t - R^f_t \right) + \epsilon_{it}$ on portfolio/security $i$ and then jointly test whether all the $\alpha_i$ are zero.

There's the classic quote of Box that all models are wrong but some are useful. In some sense, the question is whether the $\alpha$s are large, not whether they can be statistically distinguished from zero.

If you're running this on monthly data (which is standard), $\alpha_i$ will be in units of abnormal monthly returns. 1 percent per month would be absolutely huge. 0.05 percent per month is rather meh.

[1] The market portfolio of all stocks etc... is tradeable. GDP growth would be an example of something that is not tradeable.

For references, search for John Cochrane asset pricing time-series regression cross-sectional regressions etc...

Solved – Conditional mean independence implies unbiasedness and consistency of the OLS estimator

It's false. As you observe, if you read Stock and Watson closely, they don't actually endorse the claim that OLS is unbiased for $\beta$ under conditional mean independence. They endorse the much weaker claim that OLS is unbiased for $\beta$ if $E(u|x,z)=z\gamma$. Then, they say something vague about non-linear least squares.

Your equation (4) contains what you need to see that the claim is false. Estimating equation (4) by OLS while omitting the variable $E(u|x,z)$ leads to omitted variables bias. As you probably recall, the bias term from omitted variables (when the omitted variable has a coefficient of 1) is controlled by the coefficients from the following auxiliary regression: \begin{align} E(u|z) = x\alpha_1 + z\alpha_2 + \nu \end{align} The bias in the original regression for $\beta$ is $\alpha_1$ from this regression, and the bias on $\gamma$ is $\alpha_2$. If $x$ is correlated with $E(u|z)$, after controlling linearly for $z$, then $\alpha_1$ will be non-zero and the OLS coefficient will be biased.

Here is an example to prove the point: \begin{align} \xi &\sim F(), \; \zeta \sim G(), \; \nu \sim H()\quad \text{all independent}\\ z &=\xi\\ x &= z^2 + \zeta\\ u &= z+z^2-E(z+z^2)+\nu \end{align}

Looking at the formula for $u$, it is clear that $E(u|x,z)=E(u|z)=z+z^2-E(z+z^2)$ Looking at the auxiliary regression, it is clear that (absent some fortuitous choice of $F,G,H$) $\alpha_1$ will not be zero.

Here is a very simple example in R which demonstrates the point:

set.seed(12344321)
z <- runif(n=100000,min=0,max=10)
x <- z^2 + runif(n=100000,min=0,max=20)
u <- z + z^2 - mean(z+z^2) + rnorm(n=100000,mean=0,sd=20)
y <- x + z + u

summary(lm(y~x+z))

# auxiliary regression
summary(lm(z+z^2~x+z))

Notice that the first regression gives you a coefficient on $x$ which is biased up by 0.63, reflecting the fact that $x$ "has some $z^2$ in it" as does $E(u|z)$. Notice also that the auxiliary regression gives you a bias estimate of about 0.63.

So, what are Stock and Watson (and your lecturer) talking about? Let's go back to your equation (4): \begin{align} y = x\beta + z\gamma + E(u|z) + v \end{align}

It's an important fact that the omitted variable is only a function of $z$. It seems like if we could control for $z$ really well, that would be enough to purge the bias from the regression, even though $x$ may be correlated with $u$.

Suppose we estimated the equation below using either a non-parametric method to estimate the function $f()$ or using the correct functional form $f(z)=z\gamma+E(u|z)$. If we were using the correct functional form, we would be estimating it by non-linear least squares (explaining the cryptic comment about NLS): \begin{align} y = x\beta + f(z) + v \end{align} That would give us a consistent estimator for $\beta$ because there is no longer an omitted variable problem.

Alternatively, if we had enough data, we could go ``all the way'' in controlling for $z$. We could look at a subset of the data where $z=1$, and just run the regression: \begin{align} y = x\beta + v \end{align} This would give unbiased, consistent estimators for the $\beta$ except for the intercept, of course, which would be polluted by $f(1)$. Obviously, you could also get a (different) consistent, unbiased estimator by running that regression only on data points for which $z=2$. And another one for the points where $z=3$. Etc. Then you'd have a bunch of good estimators from which you could make a great estimator by, say, averaging them all together somehow.

This latter thought is the inspiration for matching estimators. Since we don't usually have enough data to literally run the regression only for $z=1$ or even for pairs of points where $z$ is identical, we instead run the regression for points where $z$ is ``close enough'' to being identical.

Best Answer

Some simple models for expected returns

Example of how to use the CAPM (or any of these factor asset pricing models).

Related Solutions

Solved – Using Regression to Determine whether the CAPM holds

Basic summary:

Solved – Conditional mean independence implies unbiasedness and consistency of the OLS estimator

Related Question