The wave equation can be written as $\nabla_i\nabla^i\varphi = 0$ where $\nabla$ is the Levi-Civita connection on Minkowski space, $\varphi$ does not need to be Lorentz invariant. $\partial_i\partial^i\varphi = 0$ is the Laplace equation on both Minkowski and Euclidean space since ordinary partial derivatives do not respect the metric.
In (pseudo)Riemannian geometry the covariant derivative $\nabla_i$ replaces partials $\partial_i$.
The Laplace-Beltrami operator
$$\Delta \triangleq \nabla^i\nabla_i$$
is a common generalization of both the D'Alembertian and the Laplacian,
$$\Delta \varphi = 0$$
is the Laplace-Beltrami equation.
Your observation then generalizes to say that, for scalar field, a vanishing Laplace-Beltrami operator is the same as a divergence free contravariant gradient. This is universally valid since
$$\Delta \varphi = \nabla^i\nabla_i\varphi = g^{ij}\nabla_i\nabla_j\varphi= \nabla_i\nabla^i\varphi$$
It does not depend on the metric or coordinate system.
However, the precise geometric meaning of this depends on the metric.
In Euclidean spaces it means a function is harmonic iff it has a divergence free gradient since the Laplace-Beltrami operator is just the Laplacian.
In Minkowski space the Laplace-Beltrami operator is the D'Alembertian,
and the Laplace-Beltrami equation becomes the wave equation. In Minkowski space covariant derivatives are just ordinary partial derivatives, as in Euclidean space, since there is no curvature, making Christoffel symbols vanish.
Furthermore for scalars
$$\nabla_i \varphi = (d\varphi)_i = \partial_i\varphi$$
is true in all metrics.
However the contravariant derivative $\nabla^i\varphi$ is not the usual gradient vector since index raising depends on the the metric.
The gradient is therefore more naturally thought of as a covector or a 1-form.
Geometrically the Minkowski gradient vectors are the time reflections of what the Euclidean gradient vectors would be.
I do not have a good intuitive picture of contravariant derivatives and their divergences in general non Euclidean spaces though.
Note that for
$$\varphi \triangleq t^2 + x^2$$
in Euclidean space
$$\nabla_i\nabla^i\varphi = 4$$
while in Minkowski space
$$\nabla_i\nabla^i\varphi = 0$$
making it a solution to the equation even though it is not Lorentz invariant.
A general vector field $J^i$ may satisfy $\nabla_i J^i = 0$, but still have non zero curl or in more generally $\nabla_{[i} J_{j]} \neq 0$. Such a field is not a gradient of a scalar field. So at least $\nabla_{[i} J_{j]} = 0$ is required. The Poincare lemma says this is sufficient for fields on contractible subsets of Euclidean space.
Mathematically speaking they are the same operator. Usually we reserve the d'Alembertian for 3+1 dimensional spacetime (so in absence of curvature it takes the form $\partial_0^2 - \nabla^2$), while the Laplace-Beltrami operator is defined for an aribtrary dimensional manifold with arbitrary signature. The only possible difference is that sometimes (not always, though), $\Box$ is defined as $\partial_0^2 - \partial_1^2 - \partial_2^2 - \partial_3^2$ independently of signature, so if your metric is $(-+++)$ then you will have $\Box = -\nabla^2$, where $\nabla^2$ here means the LB operator.
Best Answer
There isn't any difference here between using curved spacetime and curvilinear coordinates. While one could tell "true" curvature in spacetime by, say, calculating scalars like $R^{\alpha\beta\gamma\delta} R_{\alpha\beta\gamma\delta}$ or $R^\mu{}_\mu$ and seeing if they vanish, the fact is your differential operator doesn't care. Put another way, all you care about is whether or not the connection coefficients $\Gamma^\sigma_{\mu\nu}$ vanish, not whether some special coordinate-independent quantity does.
Because of this, Minkowski spacetime often but not always implicitly involves Cartesian coordinates, otherwise one could probably just call it "flat." If your spacetime is flat but your coordinates are not Cartesian, there will be terms you omitted from your third equation. In any spacetime and for any scalar1 $\varphi$, \begin{align} \square\varphi & = g^{\mu\nu} \nabla_\mu \nabla_\nu \varphi \\ & = g^{\mu\nu} \nabla_\mu (\partial_\nu \varphi) \\ & = g^{\mu\nu} (\partial_\mu \partial_\nu \varphi - \Gamma^\sigma_{\mu\nu} \partial_\sigma \varphi) \\ & = (\partial_\mu \partial^\mu - g^{\mu\nu} \Gamma^\sigma_{\mu\nu} \partial_\sigma) \varphi. \tag{1} \end{align}
So if in coordinates $(t,x,y,z)$ we have $g_{\mu\nu} = \mathrm{diag}(-1,1,1,1)$, then sure, $\Gamma^\sigma_{\mu\nu} = 0$ and $\square\varphi = \partial_\mu \partial^\mu \varphi$. The same flat spacetime in spherical coordinates $(t,r,\theta,\phi)$ -- $g_{\mu\nu} = \mathrm{diag}(-1,1,r^2,r^2\sin^2\theta)$ -- will however have some nonzero connection coefficients. In fact, you should recognize the second term in (1) as being necessary, given that there are single derivatives in the spherical Laplacian: $$ \nabla^2\varphi = \left(\partial_r^2 + \frac{2}{r} \partial_r + \frac{1}{r^2} \partial_\theta^2 + \frac{1}{r^2\tan\theta} \partial_\theta + \frac{1}{r^2\sin^2\theta} \partial_\phi^2\right) \varphi. $$
The thing to remember is covariant derivatives reduce to partial derivatives generally only in the following two circumstances:
Other than those two times, you have to resort to the general rule $$ \nabla_\mu T^\alpha{}_\beta = \partial_\mu T^\alpha{}_\beta + \Gamma^\alpha_{\mu\sigma} T^\sigma{}_\beta - \Gamma^\rho_{\mu\beta} T^\alpha{}_\rho, $$ where extra indices follow the same pattern as $\alpha$ or $\beta$ depending on if they are up or down.
1 Note that if $\varphi$ is a higher-rank tensor, the simplification $\nabla\varphi \to \partial\varphi$ no longer holds. Also, in the context of wave equations, often the differential operator you want is not the d'Alambertian but rather the de Rham operator, which adds a coupling between the Ricci tensor and your tensor being differentiated (a "curvature coupling"). This reduces to the d'Alambertian on scalars.