Is it wrong to view convolution as template matching

conv-neural-networkconvolutioncosine similaritylinear algebra

I am reading about the convolution operation but I can't see how it can be seen as template matching.

Suppose that we convolve the input $\mathbf{X}$:
$$
\begin{bmatrix}
1 & 0 & 0 & 0 \\
0 & 1 & 0 & 1 \\
0 & 0 & 1 & 1 \\
0 & 0 & 1 & 1 \\
\end{bmatrix}
$$

with the kernel $\mathbf{K}$ (stride equals 1):
$$
\begin{bmatrix}
1 & 0 & 0 \\
0 & 1 & 0 \\
0 & 0 & 1 \\
\end{bmatrix}
$$

Then the output is:
$$
\begin{bmatrix}
1 & 0 & 0 & 0 \\
0 & 1 & 0 & 1 \\
0 & 0 & 1 & 1 \\
0 & 0 & 1 & 1 \\
\end{bmatrix}
*
\begin{bmatrix}
1 & 0 & 0 \\
0 & 1 & 0 \\
0 & 0 & 1 \\
\end{bmatrix}
=
\begin{bmatrix}
\color{green}{3} & 1 \\
1 & \color{red}{3} \\
\end{bmatrix}
$$

where $*$ denotes the convolution operator.

The problem I see is that the output obtained from the sum of element-wise multiplication between the kernel and the local patch:
$$
\begin{bmatrix}
\color{green}{1} & \color{green}{0} & \color{green}{0} & 0 \\
\color{green}{0} & \color{green}{1} & \color{green}{0} & 1 \\
\color{green}{0} & \color{green}{0} & \color{green}{1} & 1 \\
0 & 0 & 1 & 1 \\
\end{bmatrix}
$$

is the same with output obtained from the sum of element-wise multiplication between the kernel and the local patch::
$$
\begin{bmatrix}
1 & 0 & 0 & 0 \\
0 & \color{red}{1} & \color{red}{0} & \color{red}{1} \\
0 & \color{red}{0} & \color{red}{1} & \color{red}{1} \\
0 & \color{red}{0} & \color{red}{1} & \color{red}{1} \\
\end{bmatrix}
$$

That is, $\sum_{ij} (\color{red}{\mathbf{X}} \odot \mathbf{K})_{ij} = \sum_{ij} (\color{green}{\mathbf{X}} \odot \mathbf{K})_{ij}$.

Does the above observation mean that we can't view convolution as template matching, since different patterns in the input result in the same output?

Best Answer

Yes, convolution alone is not equivalent to template matching based on cosine similarity. You still need to normalize.

The cross products does indeed assign the same value, 3, to the upper left and the lower right submatrix.

$$ \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & \color{red}{1} & \color{red}{0} & \color{red}{1} \\ 0 & \color{red}{0} & \color{red}{1} & \color{red}{1} \\ 0 & \color{red}{0} & \color{red}{1} & \color{red}{1} \\ \end{bmatrix} * \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ \end{bmatrix} = \begin{bmatrix} \color{green}{3} & 1 \\ 1 & \color{red}{3} \\ \end{bmatrix} $$ This is because it only counts matches of 1s between the data matrix and the kernel matrix. To adjust for mismatches, this count needs to be normalized. This is achieved by dividing with the square root of the product of counts in both matrices, in much the same way as the covariance between two random variables is divided by the square root of the product of their variances.

The normalized correlation between lower right submatrix and kernel becomes

$$ \frac{ \begin{bmatrix} \color{red}{1} & \color{red}{0} & \color{red}{1} \\ \color{red}{0} & \color{red}{1} & \color{red}{1} \\ \color{red}{0} & \color{red}{1} & \color{red}{1} \\ \end{bmatrix} * \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ \end{bmatrix} } { \sqrt{ \left\lVert \begin{bmatrix} \color{red}{1} & \color{red}{0} & \color{red}{1} \\ \color{red}{0} & \color{red}{1} & \color{red}{1} \\ \color{red}{0} & \color{red}{1} & \color{red}{1} \\ \end{bmatrix} \right\rVert \cdot \left\lVert \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ \end{bmatrix} \right\rVert }} = \frac{\color{red}{3}} {\sqrt{\color{red}{6}\cdot 3}} =\frac{\sqrt{2}}{2}, $$and for the whole matrix, the result is $$ \begin{bmatrix} \color{green}{1} & \sqrt{3}/6 \\ 1/3 & \color{red}{{\sqrt{2}}/2} \\ \end{bmatrix}. $$