Let the spectral decomposition of $\Sigma$ be $\Sigma=PDP^T,$ where $P$ is orthonormal and $D$ is diagonal. Then $u^T\Sigma u=\displaystyle\sum_{i=1}^d\lambda_i(p_i^Tu)^2,$ where $p_i$ is the $i^{\text{th}}$ column of $P$, in other words, the $i^\text{th}$ eigenvector of $\Sigma.$
We want to find $u$ such that $\displaystyle\sum_{i=1}^d\lambda_i(p_i^Tu)^2$ is maximized. Since $p_i$'s form an orthonormal basis, $\displaystyle\sum_{i=1}^d(p_i^Tu)^2=1.$
Consider the optimization problem:
$$\text{Maximize }\displaystyle\sum_{i=1}^d\lambda_iz_i^2\text{ subject to }\sum_{i=1}^dz_i^2=1.$$
Suppose $\lambda_1\ge\lambda_2\ge\dots\ge\lambda_d.$ It is clear that we must have $z_1=1,$ $z_i=0$ for the rest, because otherwise we will have a lower value of the objective function. That would mean
$$p_1^Tu=1,\text{ and }p_i^Tu=0\text{ for all }i\neq 1.$$
By equality in Cauchy-Schwarz inequality, $p_1^Tu=1\iff u=c\times p_1,$ for some constant $c.$ By the norm $1$ constraint, $u=p_1.$
The two values you call vector projection and scalar projection are more tightly linked than your deifinitions seem to imply. In fact, by definition, we have that for two vectors $\vec a,\vec b$, the cosine of the angle between the two vectors is defined to be $$\cos\theta = \frac{\vec a\cdot \vec b}{|\vec a||\vec b|}$$
which means that what you call "the scalar projection of $\vec a$ along $\vec b$", which is $|\vec a|\cos\theta$, is in fact equal to $$|\vec a|\cos\theta = |\vec a| \frac{\vec a\cdot \vec b}{|\vec a||\vec b|} = \frac{\vec a\cdot \vec b}{|\vec b|}.$$
Note that this value is precisely the length of what you call the "vector projection of $\vec a$ along $\vec b$". Also, since the length of the vector is a scalar, while the vector projection itself is a vector, it is very common to just use the word "projection" in common mathematical language, because in the vast majority of cases, it is clear from context whether we are talking about a vector (and thus a vector projection) or a scalar.
Additionally, there is the word "component". This word will most almost always denote a scalar quantity, and is tightly linked to the concept of projections. In particular, in $\mathbb R^n$, the $i$-th component of a vector $x\in\mathbb R^n$ is equal to the scalar projection of $x$ along the $i$-th basis vector.
This idea can be greatly generalized. If you are familiar with what a basis of a vector space is, then you might remember that if $B=\{b_1,\dots,b_n\}$ is a basis for a vector space $X$, then any $x\in X$ can be uniquely represented as $x=\alpha_1b_1+\cdots+\alpha_n b_n$. The common terminology is to refer to the values $\alpha_1,\dots,\alpha_n$ (which are scalars) as the components of $x$ in the basis $B$.
Now, you might already be sensing the connection between scalars projections and components. Indeed, if $B$ is an orthogonal set, then $\alpha_i$ is precisely equal to the scalar projection of $x$ along the vector $b_i$.
So, with all that out of the way, you can look again at the post you link as your question. The accepted answer states that "$\dfrac{\vec a \cdot \vec b}{|b|}$ is the component of $\vec a$ along $\vec b$." What this sentence is saying is basically "In any orthogonal basis in which $b$ is one of the basis vectors, $\dfrac{\vec a \cdot \vec b}{|b|}$ is the component of $a$ belonging to $b$ in that basis".
In that sense, the answer you link is correct, as long as you understand the implicit facts behind it :).
Best Answer
For two vectors $a,b \in \mathbb R^n$, the orthogonal projection of $a$ onto the span of $b$ is given by $P_b(a) = \langle a ,b\rangle \frac{b}{\langle b,b\rangle}$, where the brackets denote the scalar product, i.e. $\langle a, b\rangle = a^Tb = b^Ta$. If $b$ has norm $1$, i.e. $\langle b,b\rangle =1,$ this simplifies to $P_b(a) = \langle a ,b\rangle b = (b^Ta)b$.
If you want to plot this vector, the vector part ($b$) tells you in which direction to go and the scalar factor ($\langle a ,b\rangle = (b^Ta)$) tells you how far. If you already know in which direction to go, the only interesting information is the distance.
The data matrix $D$ in your example consists of a lot of vectors like $a$, written down in columns.
So if you just consider a (normed) vector $b$ and look at $b^T D, $ this will be a 1xn column vector.
What does this vector tell us? The first entry is the scalar factor for the first column, the second for the second, and so on. What's missing is the vector $b$ to multiply it with.
That is: $b^T D$ is the column vector which contains the coordinates of the columns of $D$ with respect to $b$. If you want to plot this, you can forget about the actual direction of $b$, as the aim is to consider $b$ as the 'new x' and the orthogonal complement of $b$ as the new 'y'. We don't care about the direction, only the coordinates: If $c$ is orthogonal to $b$ and has norm 1, then we can calculate the coordinates of the columns of $D$ as $c^TD$. If we plot the first entry of $b^TD$ against the first entry of $c^TD$, we will have the the data point corresponding to the first column represented in the new coordinate system, and so on for all the other columns of $D$.