You can approach this dually using that for classical solutions the finite propagation speed holds. This argument is similar in spirit to this answer of mine for low regularity uniqueness for the linear wave equation.
1
First write $w = u-v$. Then you see that $w$ solves a linear wave equation of the form
$$ \Box w = G(u,v) w $$
where the potential $G(u,v) \lesssim |u|^{p-1} + |v|^{p-1}$. Pretend now that $G$ is a smooth function of $(t,x)$.
2
Suppose for convenience that $x_0 = 0$. You want to prove that $w(t,\cdot) |_{B(0,t_0 - t)} = 0$ for $t\in (0,t_0)$. So it is enough to show that $\int w(t,x) f(x) ~dx = 0$ for every $f\in C^\infty_c(B(0,t_0-t))$. We will try to do so using a duality argument.
Fix $T\in (0,t_0)$, and take $f\in C^\infty_c(B(0,t_0-T))$. Solve the wave equation
$$ \Box \varpi = G \varpi $$
with initial data $\varpi(0,T) = 0$ and $\partial_t\varpi(0,T) = f$. Under the assumption that $G$ is smooth, we have that $\varpi$ is a smooth function with compact support for all time. In fact, for $t\in (0,T)$ we have that $\mathrm{supp}~ \varpi \subset B(0,t_0 - t)$ using the finite speed of propagation for classical solutions.
3
That $w$ is an energy solution implies that the following identity holds for any test function $\varpi$:
$$ \int_0^T \int_{\mathbb{R}^d} \Box w \varpi - w \Box \varpi = \int_{\mathbb{R}^d} w \partial_t\varpi - \partial_t w \varpi \Big|_{t = 0}^T $$
Using the support property for $\varpi$ as derived above, you have that the right hand integral vanishes at $t = 0$ since $w,\partial_t w$ vanishes on $B(0,t_0)$. The integral at $t = T$ has only one term, and that is $\int w f$.
The left hand side however vanishes: since the two functions solve the same (linear) equation you have $LHS = \int Gw~v - w ~Gv = 0$.
4
The above works assuming that $G$ is smooth. In general, $G$ is not. Replace $G$ by $G_\epsilon$ through mollification. And replace $\varpi$ correspondingly by $\varpi_\epsilon$. Then it suffices to show that
$$ \int_0^T \int_{\mathbb{R}^d} (G - G_\epsilon) w \varpi_\epsilon \to 0 $$
Let me do the critical case for convenience; the argument should be similar for the subcritical cases with some adjustment of the exponents.
Noting that $p-1 = \frac{4}{d-2}$, by Strichartz inequality, $G$ belongs to space-time norm $L^1([0,T]; L^q_x)$ for any $q\in [\frac{d}{2},d]$. And the mollification will converge in that norm. Since $w$ is energy class you have that it is uniformly bounded in $L^\infty_t L^{2d/(d-2)}_x$.
The Strichartz estimates together with Gronwell's inequality can be used to show that if $G_\epsilon \in L^1_t L^d_x$, then
$$ \|\varpi_\epsilon(t)\|_{\dot{H}^1} \lesssim e^{C(T-t) \|G_\epsilon\|_{L^1_t L^d_x}} \|f\|_{L^2} $$
This shows that $\|\varpi_\epsilon(t)\|_{L^\infty_t L^{2d/(d-2)}}$ is uniformly bounded on the time interval $[0,T]$. And hence the fact that $G-G_\epsilon$ converges to zero in $L^1_t L^{d/2}$ gives that $|\int w(T,x) f(x) ~dx | = 0$ after taking the limit.
Best Answer
This is not an answer but rather an extended comment.
The derivative in its ordinary sense is local: for any neighborhood of $x$, if $f \equiv g$ on that neighborhood and $f^{(n)}(x)$ exists then $g^{(n)}(x)$ exists and $f^{(n)}(x) = g^{(n)}(x)$. So, equations of the form $$ x^{(n)}(t) = F(t, x(t), x'(t), \dots, x^{(n-1)}(t)) $$ or $$ \frac{\partial u}{\partial t}(t, x) = \frac{\partial^2 u}{\partial x^2}(t,x) + F(t, x, u(t, x)) $$ are referred to as local.
Generally, nonlocal denotes that in an equation in question there is something that does not belong to the above category. For instance, we can replace in an ODE the derivative by a derivative of fractional order: the latter does not have the locality property as described in the first paragraph.
Equations of the form $$ x'(t) = F(t, x(t), x(a(t))), $$ where $a$ is a given function, are, to the best of my knowledge, (almost) never called nonlocal: the standard name appears to be (functional) (ordinary) differential equations with deviating argument (retarded or delayed if $a(t) < t$, and advanced if $a(t) > t$).
The boundary conditions considered in Byszewski's paper look like "usual" multipoint boundary conditions.
It seems that the author just chose to call problems considered by him functional-differential nonlocal problems. I think it would be proper to ask him directly: ludwik.byszewski@pk.edu.pl.