Say instead of A, we solve for columns of A one by one.
max $u^TBu$
such that $u^Tu=1$
Construct the lagrangian, then differentiate to get:
$dL/du$=2Bu-2 $\lambda u$
Equate to zero, solve, you end up with Eigen decomposition problem of B.
Now pick the Eigen vector that corresponds to the largest Eigen value. If you want more solutions (more colums to A, pick the second and the third Eigen vectors that corresponds to second and third largest Eigen value and so on)
Vectors will be typeset as rows, and matrices typeset as lists of rows.
Theorem. As we vary $X$ over all skew-symmetric matrices, the smallest possible length for $E= AX-B$ is $||E||= \frac{|A\cdot B}{||A||}$.
Proof. Use Gram-Schmidt to construct an orthonormal basis for space chosen so that your two given vectors $A$ and $B$ take the form $A=(a,0,0\ldots,0)$ and $B=(b_1,b_2,\ldots,0)$. The structure of any skew-symmetric matrix is $ M=((0,s,
\ldots);(-s,0,\ldots);(\ldots);\ldots)$ for some unknown scalar $s$. (The suppressed terms in this matrix denoted by $\ldots$ have no effect in the computation that follows.) Note that the row vector $A.M=(0, as, \ldots)$ and thus the error vector $E=A.M-B= (-b_1, as-b_2,\ldots)$. The length of this error vector $E$ is minimized when the suppressed terms $\ldots$ are zero, and when $s=b_2/a$; and with this choice the minimum length is $||E||= |b_1|=\frac{|A.B|}{||A||}$.
P.S. In general you can see that since always $AX \perp A$ (proof below), there is no hope that $AX$ can cancel out the component of $B$ that is parallel to $A$.
Proof. If $X$ is skew-symmetric, then $<AX, A>= <A, AX^t> =-<A,AX>=-<AX,A>$ so $<AX,A>=0$; that is $AX\perp A$.
Best Answer
Write $\Sigma=XDX^t$ where $X$ is orthonormal and $D$ is diagonal with non-negative entries.
We want to maximize $tr(V^tXDX^tV)$. Consider the transformation $W=X^tV$ and ovserve that $W^tW=V^tXX^tV=V^tV=I$. Since $X^t$ is an invertible matrix, this defines an invertible transofmration on the space of allowable $V$s, so the original optimization problem is equivalent to
$max Tr(W^tDW), W^tW=I_d$
On the other hand, $Tr(W^tDW)=Tr(DWW^T)=\sum_i d_i (WW^T)_{ii}$.
Lemma
$0\leq (WW^T)_{ii}\leq 1$.
Proof of lemma
The first inequality is clear, because $(WW^T)_{ii}$ is the squared norm of the $i$th row of $W$. To establish the second, observe that for any matrix $M$, the norm of any column of $M$ is bounded by the largest singular value of $M$. This follows immediately from the characterization $\sigma_1(M)=\sup_{|v|=1} |Mv|$, and noting that the $i$th column is given by $Me_i$, where $e_i$ is a standard basis vector. Furthermore, it is a general fact that the singular values of $M$ are the square roots of the eigenvalues of $MM^T$. In particular, since $W^tW=I$, we conclude that all singular values of $W^t$ are equal to 1, and consequently the norm of each column of $W^t$ is bounded by 1.
(end proof of lemma)
Given the constraints on $(WW^T)_{ii}$ it is clear that $\sum_i d_i (WW^T)_{ii}$ is maximized when $(WW^T)_{ii}=1$ if if $i\leq k$ and $0$ if not (we assume WLOG that the entires of $D$ are ordered from largest to smallest). This can be attained by setting the $i$th column of $W$ to be $e_i$ if $i\leq k$ and $0$ if $i>k$. Finally, remembering that $W=X^tV$ where $X$ is the matrix of eigenvalues of $\Sigma$, we see that $V$ consists precisely of the top $k$ eigenvectors of $\Sigma$.