Why convolution output size is different using Toeplitz matrix

convolutiontoeplitz

From CNN, 2D Convolution output size is (Input height – Kernel height + 1) * (Input width – Kernel width + 1), where padding = 0 and stride = 1. I was trying to find how to compute a convolution operation with "matrix multiplication", and the solution was to use a doubly block circulant matrix. Reading a few posts about how to implement doubly block circulant matrix, I realize that they have used a different way in computing the convolution size: (Input height + Kernel height – 1) * (Input width + Kernel width – 1). Can you give me an insight into why they use the different computation for the convolution output size?

Source: Source1, Source2

Best Answer

Using a quote from wikipedia

In mathematics, convolution is a mathematical operation on two functions ($f$ and $g$) that produces a third function ($f ∗ g$).

If you look carefully, you'll find that there are no real boundaries in the mathematical definition of a convolution — i.e., theoretically there is an output for every pixel index (cf. infinite zero-padding). However, since most of these outputs would be zero anyway (the kernel does not overlap with the actual image), these are generally ignored in the context of signal processing.

Now, in a general signal processing context, the output of the convolution is considered to be every output for which the value is non-zero. This corresponds to padding the input with $K-1$ zeros. In the context of neural networks, on the other hand, a lot of non-zero outputs are effectively ignored by using no (or little) input padding.

The sources you linked to talk about convolutions in the context of signal processing, not in the context of neural networks, which explains the discrepancy.