$\hat{Y} = X^T\hat{\beta}$ Matrix Dimension For Linear Regression Coefficients $\beta$

linear algebralinear regressionmatricesstatistics

While reading about least squares implementation for machine learning I came across this passage in the following two photos:
photo1
photo 2

Perhaps I’m misinterpreting the meaning of $ \beta $ but if $ X^T$ has dimension $ 1 \times p $ and $\beta$ has dimension $ p \times K $, then $\hat{Y} $ would have dimension $1\times K$ and would be a row vector. According to the text, vectors are assumed column vectors unless otherwise noted.

Can someone provide clarification?

Edit: the matrix notation in this text is confusing me. The pages preceding the above passages state the following:

photo 3
photo 4

Should the matrix referenced not have dimensions $ p \times N$, assuming a $p$-vector is a vector with $p$-elements? Or are the input vectors assumed to be row vectors.

Note: The passage is taken from “Elements of Statistical Learning” by Hastie, Tibshirani, & Friedman.

Best Answer

If the authors are really consistent with the convention that all vectors are columns, then there is a typographical error in the text. The first paragraph of 2.3.1 should read:

... Given a vector of inputs $X=(X_1, X_2,\ldots,X_p)^T$ ...

As for equation (2.2), the preceding text was assuming $\hat Y$ is a scalar, so there's no conflict (yet). But the analogue of (2.2) where $\hat Y$ is a $K$-vector should be written ${\hat Y}^T=X^T\hat\beta$.

As for the dimension of the matrix $\bf X$, in order to arrive at dimension $N\times p$ it's necessary to load each $x_i$ in transposed form into the rows of $\bf X$, which is the remark made in the final sentence.

Related Question