While reading about least squares implementation for machine learning I came across this passage in the following two photos:
Perhaps I’m misinterpreting the meaning of $ \beta $ but if $ X^T$ has dimension $ 1 \times p $ and $\beta$ has dimension $ p \times K $, then $\hat{Y} $ would have dimension $1\times K$ and would be a row vector. According to the text, vectors are assumed column vectors unless otherwise noted.
Can someone provide clarification?
Edit: the matrix notation in this text is confusing me. The pages preceding the above passages state the following:
Should the matrix referenced not have dimensions $ p \times N$, assuming a $p$-vector is a vector with $p$-elements? Or are the input vectors assumed to be row vectors.
Note: The passage is taken from “Elements of Statistical Learning” by Hastie, Tibshirani, & Friedman.
Best Answer
If the authors are really consistent with the convention that all vectors are columns, then there is a typographical error in the text. The first paragraph of 2.3.1 should read:
As for equation (2.2), the preceding text was assuming $\hat Y$ is a scalar, so there's no conflict (yet). But the analogue of (2.2) where $\hat Y$ is a $K$-vector should be written ${\hat Y}^T=X^T\hat\beta$.
As for the dimension of the matrix $\bf X$, in order to arrive at dimension $N\times p$ it's necessary to load each $x_i$ in transposed form into the rows of $\bf X$, which is the remark made in the final sentence.