Your first method is correct.
Your second method does indeed count each distribution multiple times.
The possible distributions are $(4, 1, 1)$, $(3, 2, 1)$, $(3, 1, 2)$, and $(2, 2, 2)$.
There are four distributions of the form $(4, 1, 1)$, depending on whether the left or right square is left empty in the second and third rows. Your second method counts such distributions four times, once for each way you could have designated one of the four objects in the first row as the object in the first row.
There are eight distributions of the form $(3, 2, 1)$, depending on which of the four squares is left empty in the first row and which of the two squares is left empty in the third row. Your second method counts such distributions $3 \cdot 2 = 6$ times, once for each way you could have designated one of the objects in the first row as the object in the first row and once for each way you could have designated one of the objects in the second row as the object in the second row.
By symmetry, there are eight distributions of the form $(3, 1, 2)$, each of which you have counted six times.
There are six distributions of the form $(2, 2, 2)$, depending on which two entries have been left empty in the first row. Your second method counts each such distribution $2 \cdot 2 \cdot 2 = 8$ times, once for each way you could have designated one of the two objects in each row as the object in that row.
Notice that
$$\color{red}{4} \cdot 4 + 2 \cdot \color{red}{6} \cdot 8 + \color{red}{8} \cdot 6 = \color{red}{16} + \color{red}{96} + \color{red}{48} = \color{red}{160}$$
Best Answer
Index the squares on the $40\times 40$ grid by a pair of integers $(x,y)$ with $1 \le x, y \le 40$. Let us call the square at $(x,y)$ as $S_{x,y}$. Consider following subset of squares on the grid:
$$\mathcal{S} = \big\{\; S_{x,y} : x \equiv 1 \pmod 3, y \equiv 1 \pmod 3\;\big\}$$
It is clear $\mathcal{S}$ contains $\lceil \frac{40}{3} \rceil \times \lceil \frac{40}{3} \rceil = 14 \times 14 = 196$ elements. Pick any two squares $S_{x_1, y_1}$, $S_{x_2,y_2}$ from $\mathcal{S}$, we have either
$$x_1 \ne x_2 \implies |x_1 - x_2 | \ge 3 \quad\text{ OR }\quad y_1 \ne y_2 \implies |y_1 - y_2 | \ge 3$$
This implies we cannot find a $3\times 3$ square that cover $S_{x_1,y_1}$ and $S_{x_2,y_2}$ at the same time.
If we cover the grid by some numbers of $3\times 3$ squares. The $3\times 3$ squares covering distinct elements from $\mathcal{S}$ are distinct. This implies we need at least $196$ $3\times 3$ squares to cover the whole grid.
It is clear how to cover the $40 \times 40$ grid by 196 copies of $3\times 3$ squares. So the desired answer is indeed $196$.
The basic idea is no different from a much simpler problem of covering a $3\times 3$ black and white chessboard having 5 whites squares and 4 black squares using $1 \times 2$ dominoes. Since each domino cover one white square and one black square. If the chessboard has 5 white squares, you need at least $5$ dominoes to cover it.