If you were considering an infinite chess board, you might get a solution as a (slightly complicated) formula, but on a finite board the restriction of edges (and especially corners) does affect things. Obviously you can draw maps:
And there you can see that the corner position of the knight means that it is quite slow to reach its diagonally neighbouring square, whereas with an open-board knight:
diagonally-adjacent squares are reached more quickly.
Additional thought:
Outside the $5\times 5$ square centred on the knight, the move-distance pattern becomes simpler. Then if you find $\Delta x, \Delta y$ (unsigned), compute the maximum of $\left ( \frac{\Delta x}{2},\frac{\Delta y}{2},\frac{\Delta x+\Delta y}{3} \right )$ and round up to the nearest integer. Call this $m'$. Now calculate the move count $m$ as follows:
$$
m=m'+((m'+\Delta x+\Delta y) \bmod 2)
$$
To handle close-in squares (on a board of at least $5\times 5$) we can list the exceptions:
$$ \begin{align}
\Delta x =\Delta y =2 &\implies m=4 \\
\Delta x+\Delta y =1 &\implies m=3 \\
\text{For a knight in a corner only, }\Delta x=\Delta y =1 &\implies m=4 \hspace{3in}
\end{align}$$
An exact solution can be acquired by computing $d_{x, y}(n)$ which is the number of distinct paths reaching $(x, y)$ for the first time after $n$ steps. The total number of paths $p_{x,y}(n)$ which contains $(x, y)$ after $n$ steps then has relationship:
$$p_{x, y}(n) = 8p_{x, y}(n-1) + d_{x, y}(n)$$
Using this we can express the expected number of distinct squares per path after $n$ steps, as counting the total number of distinct squares for each path and the total number of distinct paths crossing a square is equivalent when summing over all paths or all squares.
Then, saving some computation time we find by four-fold rotational symmetry that the expected number of distinct squares per path after $n$ steps is equivalent to:
$$1 + \frac{4}{8^{n}}\sum_{(x > 0, y \geq 0)} p_{x, y}(n)$$
How do we compute $d_{x, y}(n)$? We do it using matrices $M_{x, y}(n)$ which are initially zero everywhere except at $(0, 0)$ where they have value $1$. Then we apply a kernel convolution encoding the possible knight moves:
$$M'_{x, y}(n)=\begin{bmatrix}0&1&0&1&0\\
1&0&0&0&1\\
0&0&0&0&0\\
1&0&0&0&1\\
0&1&0&1&0\\\end{bmatrix}\star M_{x, y}(n-1)$$
Then $d_{x, y}(n)$ is the element at $(x, y)$ in $M'_{x, y}(n)$, and we set $(x, y)$ in the above matrix to zero to get $M_{x, y}(n)$ for future computation. Because we repeatedly set that coefficient to zero after each iteration it is that we count the number of distinct paths reaching $(x, y)$ for the first time, rather than all the paths that reach $(x, y)$ after $n$ steps. Unfortunately this does mean that for all possible $(x, y)$ we must compute these matrices a total of $n$ times.
I've implemented the above in Rust:
use gmp::mpz::Mpz;
use gmp::mpq::Mpq;
use rayon::prelude::*;
fn knight_step(d: Vec<Mpz>, w: i32) -> Vec<Mpz> {
let mut nd = vec![Mpz::new(); (w*w) as usize];
for &(dx, dy) in &[(-2, -1), (-2, 1), (-1, -2), (-1, 2), (1, -2), (1, 2), (2, -1), (2, 1)] {
let rx = (-dx).max(0)..(w-dx).min(w);
let ry = (-dy).max(0)..(w-dy).min(w);
for x in rx {
for y in ry.clone() {
let cx = x + dx;
let cy = y + dy;
nd[(x*w + y) as usize] += &d[(cx*w + cy) as usize];
}
}
}
nd
}
fn count_distinct(n: usize) -> Mpz {
let w = 4*n + 1; // Size of a side of the board.
let z = 2*n; // Origin is at (z, z);
let mut total_distinct = Mpz::new();
for x in z+1..w {
println!("{}", x);
let subproblems: Vec<_> = (z..w).collect::<Vec<usize>>().par_iter().map(|y| {
let mut d = vec![Mpz::new(); w*w];
d[z*w + z] = Mpz::from(1);
let mut p = Mpz::new();
for _ in 0..n {
d = knight_step(d, w as i32);
p = 8i64*p + &d[x*w + y];
d[x*w + y] = 0.into();
}
p
}).collect();
for s in &subproblems {
total_distinct += 4i64*s;
}
}
total_distinct
}
fn main() {
let n = 50;
let q = Mpq::ratio(&count_distinct(n), &Mpz::from(8).pow(n as u32));
println!("{}", q + &1.into());
}
Giving the exact result:
$$\frac{455207943209697100946821497094086408107332219}{11150372599265311570767859136324180752990208} \approx 40.82446$$
Best Answer
Added later
There is a tiny probability that you end up back where you started (a distance of $0$) after $2016$ steps. Usually you will end up some positive distance away and so the expected square of the distance away will be positive. As you take more steps, this expectation of the square increases.
You can see what happens with just one step: while the expected position is where you started because of the symmetry of the steps, you must be exactly $\sqrt{5}$ away and so the expectation of the square of the distance is $5$.
You can also see what happens with two steps: the expected position is still where you started because of the symmetry of the steps, but of the $64$ pairs of moves you can take: $8$ lead you back to a distance $0$; $8$ to a distance of $\sqrt{2}$; $8$ to a distance of $2$; $16$ to a distance of $\sqrt{10}$; $8$ to a distance of $4$; $8$ to a distance of $\sqrt{18}$; and $8$ to a distance of $\sqrt{20}$. So the expected square of distance is $\tfrac{8\cdot 0 +8\cdot 2+8\cdot 4+16\cdot10 +8\cdot 16+8\cdot 18+8\cdot 20}{64}=\frac{640}{64}=10$.
The two step expected square of distance answer is double the one step expected square of distance, and in general this extends to any number of independent steps - see my original answer. This seems to surprise you.
But it does not mean that the expected distance (not squared) is a linear function of the number of steps: for two steps it is $\tfrac{8\cdot 0 +8\cdot \sqrt 2+8\cdot 2+16\cdot\sqrt{10} +8\cdot 4+8\cdot \sqrt{18}+8\cdot \sqrt{20}}{64} \approx 2.8067$ which is a lot less than double $\sqrt{5}\approx 2.2361$.
Nor is the expected square of distance in the two step case what you would get if you always took the same step twice: that would have been $(\sqrt{5} +\sqrt{5})^2=20$, a quadrupling of $5$ rather than the doubling of the one step case that $10$ represents.
Original answer
Let's generalise this. Suppose you have two independent $n$-dimensional random variables $\mathbf{A}$ and $\mathbf{B}$ each with zero mean and where the components such as $A_i$ or $B_j$ have variances $\sigma^2_{A_i}$ and $\sigma^2_{B_j}$. (There may be dependencies between components of each random variables but they do not affect the result due to linearity of expectation.)
We have $||\mathbf{A}||^2=\sum A_i^2$ and $||\mathbf{B}||^2=\sum B_j^2$. Then
This easily extends to the the sum of any finite number of independent $n$-dimensional random variables with zero means: the expectation of the sum of the squares of the sums of the random variables is the sum of the sums of the variances of the components.
In your particular case $n=2$, and all the random variables (knights' moves) are iid with each component ($x$ and $y$ moves) have zero means and variances of $\frac{(+1)^2+(+2)^2+(-1)^2+(-2)^2}{4} = \frac52$. So the expected square of the distance after $2016$ moves is $$2016\cdot\frac52 +2016\cdot\frac52 = 10080$$