Let's say we have 25 portfolios $i=1, \ldots, 25$.
Consider the time-series regressions for each portfolio $i$.
$$R_{it} - R^f_t = \alpha_i + \beta_i \left( R^m_t - R^f_t \right) + \epsilon_{it}$$
If all the right hand side variables in your time-series regression are tradeable[1], then the $\alpha_i$ in your time series regression are equivalent to the residuals in your cross-sectional regression of expected returns on market betas.
$$ \mathrm{E}[R_i - R^f] = \gamma \beta_i + \alpha_i $$
Recall that the CAPM theory implies that expected excess returns $\mathrm{E}[R_i - R^f] $ are linear in their market betas $\beta_i$. To test the CAPM, you want to test whether all the $\alpha_i$ are jointly zero. In statistics, this is called an F-test, and in finance, its fancy name is the Gibbons-Ross-Shaken (GRS) test.
You could also run the cross-sectional regression (estimating $\gamma_0$ and $\gamma_1$)
$$ \mathrm{E}[R_i - R^f] = \gamma_0 + \gamma_1 \beta_i + \alpha_i $$
and see if $\gamma_1$ even has a positive slope.
Basically, the CAPM doesn't work. Checkout Fama-French CAPM Theory and Evidence if you want to go deep.
You can download the 25 Fama-French size, book to market portfolios and test the CAPM on them. It will do horribly in predicting the cross-sectional variation of average returns.
Basic summary:
You want to run the time series regressions $R_{it} - R^f_t = \alpha_i + \beta_i \left( R^m_t - R^f_t \right) + \epsilon_{it}$ on portfolio/security $i$ and then jointly test whether all the $\alpha_i$ are zero.
There's the classic quote of Box that all models are wrong but some are useful. In some sense, the question is whether the $\alpha$s are large, not whether they can be statistically distinguished from zero.
If you're running this on monthly data (which is standard), $\alpha_i$ will be in units of abnormal monthly returns. 1 percent per month would be absolutely huge. 0.05 percent per month is rather meh.
[1] The market portfolio of all stocks etc... is tradeable. GDP growth would be an example of something that is not tradeable.
For references, search for John Cochrane asset pricing time-series regression cross-sectional regressions etc...
It's false. As you observe, if you read Stock and Watson closely, they don't actually endorse the claim that OLS is unbiased for $\beta$ under conditional mean independence. They endorse the much weaker claim that OLS is unbiased for $\beta$ if $E(u|x,z)=z\gamma$. Then, they say something vague about non-linear least squares.
Your equation (4) contains what you need to see that the claim is false. Estimating equation (4) by OLS while omitting the variable $E(u|x,z)$ leads to omitted variables bias. As you probably recall, the bias term from omitted variables (when the omitted variable has a coefficient of 1) is controlled by the coefficients from the following auxiliary regression:
\begin{align}
E(u|z) = x\alpha_1 + z\alpha_2 + \nu
\end{align}
The bias in the original regression for $\beta$ is $\alpha_1$ from this regression, and the bias on $\gamma$ is $\alpha_2$. If $x$ is correlated
with $E(u|z)$, after controlling linearly for $z$, then $\alpha_1$ will be non-zero and the OLS coefficient will be biased.
Here is an example to prove the point:
\begin{align}
\xi &\sim F(), \; \zeta \sim G(), \; \nu \sim H()\quad \text{all independent}\\
z &=\xi\\
x &= z^2 + \zeta\\
u &= z+z^2-E(z+z^2)+\nu
\end{align}
Looking at the formula for $u$, it is clear that $E(u|x,z)=E(u|z)=z+z^2-E(z+z^2)$ Looking at the auxiliary regression, it is clear that (absent some fortuitous choice of $F,G,H$) $\alpha_1$ will not be zero.
Here is a very simple example in R
which demonstrates the point:
set.seed(12344321)
z <- runif(n=100000,min=0,max=10)
x <- z^2 + runif(n=100000,min=0,max=20)
u <- z + z^2 - mean(z+z^2) + rnorm(n=100000,mean=0,sd=20)
y <- x + z + u
summary(lm(y~x+z))
# auxiliary regression
summary(lm(z+z^2~x+z))
Notice that the first regression gives you a coefficient on $x$ which is biased up by 0.63, reflecting the fact that $x$ "has some $z^2$ in it" as does $E(u|z)$. Notice also that the auxiliary regression gives you a bias estimate of about 0.63.
So, what are Stock and Watson (and your lecturer) talking about? Let's go back to your equation (4):
\begin{align}
y = x\beta + z\gamma + E(u|z) + v
\end{align}
It's an important fact that the omitted variable is only a function of $z$. It seems like if we could control for $z$ really well, that would be enough to purge the bias from the regression, even though $x$ may be correlated with $u$.
Suppose we estimated the equation below using either a non-parametric method to estimate the function $f()$ or using the correct functional form $f(z)=z\gamma+E(u|z)$. If we were using the correct functional form, we would be estimating it by non-linear least squares (explaining the cryptic comment about NLS):
\begin{align}
y = x\beta + f(z) + v
\end{align}
That would give us a consistent estimator for $\beta$ because there is no longer an omitted variable problem.
Alternatively, if we had enough data, we could go ``all the way'' in controlling for $z$. We could look at a subset of the data where $z=1$, and just run the regression:
\begin{align}
y = x\beta + v
\end{align}
This would give unbiased, consistent estimators for the $\beta$ except for the intercept, of course, which would be polluted by $f(1)$. Obviously, you could also get a (different) consistent, unbiased estimator by running that regression only on data points for which $z=2$. And another one for the points where $z=3$. Etc. Then you'd have a bunch of good estimators from which you could make a great estimator by, say, averaging them all together somehow.
This latter thought is the inspiration for matching estimators. Since we don't usually have enough data to literally run the regression only for $z=1$ or even for pairs of points where $z$ is identical, we instead run the regression for points where $z$ is ``close enough'' to being identical.
Best Answer
As Stephen mentions, the confusion is between: (1) the CAPM vs. (2) the market model.
Let $R^f$ denote the risk free rate. We often work with excess returns, which involves subtracting of the risk free rate.
Some simple models for expected returns
``Market model" $$ R_t - R^f = \alpha + \beta\left(R^m_t - R^f \right) + \epsilon_t $$ $$ E\left[ R_t \right] - R^f = \alpha + \beta\left(E[R^m_t] - R^f \right) $$ The market model is a simple, statistical model and can be justified by assuming that the joint distribution of monthly stock returns is multivariate normal.
Capital Asset Pricing Model (CAPM) $$ E\left[ R_t\right] - R^f = \beta\left(E[R^m_t] - R^f \right) $$ The CAPM is an economic theory that expected excess returns of a stock are linear in the excess return of the market, that $\alpha = 0$ from the market model regression.
Be aware that the CAPM doesn't work. It's all over MBA corporate finance, but asset pricing people find it useless. Something less crazy to use would be the Fama-French 3 Factor Model.
Example of how to use the CAPM (or any of these factor asset pricing models).