Assuming you're not including obstacles, the shortest distance is:
$$dist = \max(\text{horizontal distance}, \text{vertical distance})$$
This is because the most distance is covered to a destination by going diagonally. So, the king moves diagonally as much as possible before it would put them past the row or file the target square is on. Then, the king moves toward the square along that row or file. This is the shortest distance, so all shortest paths have this distance.
A shortest path cannot consist of a horizontal move and a vertical move, otherwise there is a shorter path consisting of one less move in which it used a diagonal move instead.
A shortest path will only have a horizontal move or a vertical move if the horizontal distance is not equal to the vertical distance.
A shortest path has a minimum number of diagonal moves $D$ equal to:
$$D_{min} = \min(\text{horizontal distance}, \text{vertical distance})$$
Two vertical moves can be substituted with two diagonal moves instead (zig-zagging out-and-into the file). The same can be done for two horizontal moves. The number of such substitutions $S$ is equal to:
$$S = \left\lfloor{\frac{dist - D_{min}}{2}}\right\rfloor$$
There are a number of mandatory unsubstituteable vertical or horizontal moves $M$ equal to (0 or 1):
$$M = dist - D_{min} - 2S$$
So, there are $S$ substitutions to make $S+1$ if we count no substitutions, which we will. We can make $0, 1, 2, ... S$ substitutions. With this, we have enough to decide how many paths there are.
$$ N = \sum_{s = 0}^S \binom{dist}{D_{min}+s, 2S - 2s + M, s} = \sum_{s = 0}^S \frac{dist!}{(D_{min}+s)!(2S - 2s + M)! s!}$$
Edit:
There was a mistake in the formula. For better understanding of the formula in case there's still a mistake:
$$\binom{dist}{\text{forward diagonals, straight moves, back diagonals}}$$
You have to sum over every possible combination of substitutions at the end.
The following is a generalization of the "visual proof" referred to in the question.
If $p_i\geq0$ $(1\leq i\leq d)$ and $\sum_{i=1}^d p_i=n$ then $$\ell({\bf p}):=\sqrt{\sum_{i=1}^d p_i^2}$$ is minimal when all $p_i$ are equal. When the $p_i$ are forced to be integers then we should make them as equal as possible. To this end assume
$$n= q d+r, \qquad q\geq1,\quad 0\leq r<d$$
and put
$$p_i:=\left\{\eqalign{&q+1\qquad(1\leq i\leq r)\cr &q\quad\qquad(r+1\leq i\leq d)\ .\cr}\right.$$
I claim that
$$s_{n,d}=\ell({\bf p})=\sqrt{dq^2+r(2q+1)}\ .$$
I shall only prove that $s_{n,d}\leq\ell({\bf p})$. Nevertheless I can't see how one could do better.
In the ${\bf y}$-space ${\mathbb R}^d$ we consider a virtual box $$V:=[0,p_1]\times[0,p_2]\times\cdots\times[0,p_d]\subset{\mathbb R}^d\ .$$ This box is built up from $d$-dimensional unit cubes, called chambers. The main diagonal $\sigma$ of this box connects ${\bf 0}$ with ${\bf p}=(p_1,\ldots ,p_d)$ and has length $\ell({\bf p})$. This diagonal passes a certain number of chambers in turn. We now interpret these chambers as $d$-dimensional faces of the original cube $C:=[0,1]^n\subset{\mathbb R}^n$, so that $\sigma$ can be viewed as an unfolding of an admissible path from ${\bf 0}$ to ${\bf 1}$ in $C$.
The intended "backfolding" of $\sigma$ is accomplished as follows: Assign each interval $[k-1,k]$ $(1\leq k\leq p_1)$ on the $y_1$-axis the number $k$, then assign each such interval on the $y_2$-axis one of the next $p_2$ numbers, and so on along the other $y_i$-axes, until all numbers from $1$ to $n$ have so been distributed. We now identify each interval $0\leq x_k\leq 1$ on one of the axes of ${\mathbb R}^n$ with the corresponding $y_i$-interval on the proper axis of ${\mathbb R}^d$.
By projection onto the $y_i$-axes each piece $\Delta \sigma\subset \sigma$ lying in some chamber gets therewith assigned $d$ different numbers from $[n]$, and these numbers indicate which coordinates $x_k$ shall be "active" along $\Delta \sigma$, in other words: on which $d$-dimensional face of $C$ the "backfolded" piece $\Delta \sigma$ shall lie. It follows that $\sigma$ "backfolds" isometrically to an admissible path of length $\ell({\bf p})$ in $[0,1]^n$.
Best Answer
Since you are going from corner to corner and therefore from vertex to vertex, the number of steps in each direction is $8$ rather than $7$ and you get the advertised answer by your method.