What does $v*w^{T}$ represent in vector calculus

linear algebramultivariable-calculus

Say we have two column vectors $v =
\left[ \begin{array}{c}
v_1 \\\
v_2 \\\
v_3 \end{array} \right]$
and $w =
\left[ \begin{array}{c}
w_1 \\\
w_2 \\\
w_3 \end{array} \right]$

then $v^T*w$ represents the dot product of $v$ and $w$, i.e.

$\left[ \begin{array}{ccc}
v_1 & v_2 & v_3 \end{array} \right]
*
\left[ \begin{array}{c}
w_1 \\\
w_2 \\\
w_3 \end{array} \right]
= \left[ \begin{array}{ccc}
v_1w_1 + v_2w_2 + v_3w_3 \end{array} \right]$

But what does $v*w^T$ represents?

$
\left[ \begin{array}{c}
v_1 \\\
v_2 \\\
v_3 \end{array} \right]
*
\left[ \begin{array}{ccc}
w_1 & w_2 & w_3 \end{array} \right]
=
\left[ \begin{array}{ccc}
v_1w_1 & v_1w_2 & v_1w_3 \\\
v_2w_1 & v_2w_2 & v_2w_3 \\\
v_3w_1 & v_3w_2 & v_3w_3 \end{array} \right]
$

I am self-studying the concepts of multivariate calculus. I am observing the results of this format many times. Like if $v$ is a vector-valued fn and $w$ is $\nabla$ or gradient operator then $v*w^T$ is basically the Jacobian of v. Or if v is the gradient of a scalar-valued function then $v*w^T$ is the hessian of the function. Though the material from which I am reading when it comes to calculating jacobian or hessian it just asks to calculate the derivatives and fill in the entries in the matrix. But I can't help but observe these patterns. So I wanted to understand what significance this operation has.

Best Answer

This is called an outer product or dyad. Outer products result in rank-1 matrices. Note that in general, the Jacobian of a vector valued function and the Hessian of a scalar valued function have ranks higher than one. Therefore, you cannot always express these matrices as the outer products of two vectors.

One significance of this operation is in the singular value decomposition of a matrix. In particular, suppose the singular value decomposition of a matrix $X\in\mathbb{R}^{m\times n}$ is $X=U\Sigma V^\top$, where $U=\begin{bmatrix}u_1 & u_2 & \cdots & u_m\end{bmatrix}\in\mathbb{R}^{m\times m}$, $V=\begin{bmatrix}v_1 & v_2 & \cdots & v_n\end{bmatrix}\in\mathbb{R}^{n\times n}$, and $\Sigma\in\mathbb{R}^{m\times n}$ is the matrix of singular values on the diagonal of its upper left hand block. Then you can rewrite the SVD as the following dyadic expansion: \begin{equation*} X = \sum_{i=1}^{r} \sigma_i u_i v_i^\top, \end{equation*} where $r=\text{rank}(X)$. Hence, each term in the expansion consists of a scaled outer product of a left singular vector with a right singular vector. The sum of these rank-1 components results in the generation of the full rank-$r$ matrix $X$.

A second example of where you find outer products is in the computation and estimation of the covariance matrix of a random variable. In particular, suppose $X$ is a random vector taking values in $\mathbb{R}^n$, and it has expectation $\mu=\mathbb{E}[X]$. Then the covariance matrix of $X$ is \begin{equation*} \text{cov}(X) = \mathbb{E}[(X-\mu)(X-\mu)^\top]. \end{equation*} Thus, the covariance is the expectation of the outer product of the zero-mean random vector $X-\mathbb{E}[X]$ with itself. These outer products also show up in the sample covariance matrix, used to estimate $\text{cov}(X)$. In particular, suppose $\{x_1,x_2,\dots,x_m\}$ is a sample of $m$ data points from the distribution of $X$. Then the sample covariance matrix is \begin{equation*} \hat{\Sigma} = \frac{1}{m}\sum_{i=1}^m(x_i-\bar{x})(x_i-\bar{x})^\top, \end{equation*} where $\bar{x}$ is the sample mean. Therefore, the sample covariance matrix is generated by the sum of outer products of the data.

As a final example of the use of outer products, consider the problem of extracting the moving objects from the "foreground" of a grayscale video, and recovering the original background that was obscured by the moving objects. Mathematically speaking, suppose $X^{(k)}$ is a matrix representation of the $k$th frame in a video. You can vectorize each frame, and stack them into a data matrix \begin{equation*} X = \begin{bmatrix} \text{vec}X^{(1)} & \text{vec}X^{(2)} & \cdots & \text{vec}X^{(d_f)} \end{bmatrix}, \end{equation*} where $d_f$ is the number of frames in the video. This data matrix captures all of the information of the video. The columns represent the frames/time-steps of the video, and the rows represent the values of certain pixels. Then our goal of extracting the moving objects can be modeled as computing the decomposition \begin{equation*} X \approx L+S, \end{equation*} where $L$ is a low-rank matrix and $S$ is a sparse matrix. The low-rank matrix is used to model the relatively static information of the background, whereas the sparse matrix is used to capture the dynamic information of the foreground (moving objects). It has been shown that under certain conditions, this decomposition can be computed uniquely and optimally (with respect to an $\ell_1$ cost) when we restrict $L$ to be a rank-1 matrix. In this case, the optimization essentially boils down to \begin{equation*} \text{minimize}_{u\in\mathbb{R}^m,v\in\mathbb{R}^n} ~ \|X-uv^\top\|_1. \end{equation*} In other words, we are trying to find the vectors $u$ and $v$ such that their rank-1 outer product is as close to the data matrix $X$ with respect to the sparsity promoting norm $\|\cdot\|_1$. This will result in $S=X-uv^\top$ being sparse, as desired, and result in $L=uv^\top$ being rank-1. Note that the interpretation of the vectors $u$ and $v$ are quite interesting in this example. The vector $u$ can be interpreted as a "nominal" background pattern, and the vector $v$ can be interpreted as different scalings of the nominal background over time as the video progresses through the frames. The product $uv^\top$ therefore represents the scaled pixel values of each background pixel throughout the duration of the video. When you add the foreground information $S$ to this rank-1 background matrix, you approximately recover the original video $X$. For more information on this video segmentation problem and the math behind it, see here and here.

Related Question