1) OP wrote (v1):
[...] and thus this leads me to believe that ${\bf A}$ should be somehow connected to momentum, [...].
Yes, in fact the magnetic vector potential ${\bf A}$ (times the electric charge) is the difference between the canonical and the kinetic momentum, cf. e.g. this Phys.SE answer.
2) Another argument is that the scalar electric potential $\phi$ times the charge
$$\tag{1} q\phi$$
does not constitute a Lorentz invariant potential energy. If one recalls the Lorentz transformations for the $\phi$ and ${\bf A}$ potentials, and one goes to a boosted coordinate frame, it is not difficult to deduce the correct Lorentz invariant generalization
$$\tag{2} U ~=~ q(\phi - {\bf v}\cdot {\bf A})$$
that replaces $q\phi$. The caveat of eq. (2) is that $U$ is a velocity-dependent potential, so that the force is not merely (minus) a gradient, but rather on the form of (minus) a Euler-Lagrange derivative
$$\tag{3}{\bf F}~=~\frac{d}{dt} \frac{\partial U}{\partial {\bf v}} - \frac{\partial U}{\partial {\bf r}}. $$
One may show that eq. (3) reproduces the Lorentz force
$$\tag{4}{\bf F}~=~q({\bf E}+{\bf v}\times {\bf B}), $$
see e.g. Ref. 1.
References:
- Herbert Goldstein, Classical Mechanics, Chapter 1.
Below I'll use Planck units, for which, in particular, $c = \epsilon_{0} = =1$.
In fact, the full system of Maxwell's equations provides the statement that the only two vector components of the EM field $\mathbf E, \mathbf B$ are independent (in general, due to a deep symmetry reason, namely that a massless particle has only two polarizations). Next, if we write EM field in terms of 4-potential
$$
A_{\mu} \equiv \left( V, \mathbf A\right)_{\mu}
$$
(for example, $A_{0} \equiv V$),
$$
\mathbf E = -\frac{\partial \mathbf A}{\partial t} - \nabla V, \quad \mathbf B = \nabla \times \mathbf A,
$$
we add new unphysical gauge symmetry transformation
$$
\tag 1 V \to V + \frac{\partial \varphi}{\partial t}, \quad \mathbf A \to \mathbf A - \nabla \varphi ,
$$
under which $\mathbf E, \mathbf B$ is unchanged:
$$
\mathbf E \to \mathbf E, \quad \mathbf B \to \mathbf B
$$
I.e., $A_{\mu}$ is not unique. Also, it seems that without imposing some conditions 4-potential has 4 components, while this number must be 2, in correspondence to the number of independent components of $\mathbf E, \mathbf B$. Is this a problem? No.
We need to constuct the precise scheme of reduction of number of components, which includes the problem of uniqueness of 4-potential. The gauge symmetry is unphysical, so first we may fix $\varphi$ in $(1)$ by imposing so-called gauge condition, which leaves three independent components of $A_{\mu}$ instead four. It may be, for example, Coulomb gauge condition,
$$
\nabla \cdot \mathbf A = 0,
$$
or lorentz invariant condition, such as
$$
\partial_{\mu}A^{\mu} = \frac{\partial V}{\partial t} + \nabla \cdot \mathbf A = 0
$$
Next, we reduce the number of components from three to two by using first Maxwell equation, which is called Gauss law, since in fact it is the bound, because it doesn't contain first time derivatives of $\mathbf E$ or, correspondingly, second time derivatives of $A_{\mu}$ (it doesn't describe propagated degrees of freedom). For example, for Coulomb gauge this equation takes the form
$$
\nabla \cdot \mathbf E = -\Delta V - \frac{\partial (\nabla \cdot \mathbf A )}{\partial t} = -\Delta V = 4 \pi \rho ,
$$
from which we have the scalar potential as function of charge density. For $\rho = 0$, which is true in the case of magnetostatics, we may simply set $V$ to zero.
So that, we reduce the number of $A_{\mu}$ from four to two, as it must be, and solve the uniqueness problem.
Let summarize: if we have used the Gauss law as the bound, i.e., if we have reduced the number of components of 4-potential from four to three, the only thing that we need to fix it uniquely is to impose such gauge condition, since without doing that 4-potential is determined only up to unphysical transformation.
Best Answer
So intuitively $\nabla \cdot B = 0$ is best understood by appeal to the continuity equation $\dot \rho = -\nabla \cdot J,$ in other words if you imagine $\vec B$ as a flow field $\vec v$ for some fluid of constant density, this describes a flow of stuff which does not accumulate at any particular points: it is not "flowing into" or "flowing out of" any of those places.
The expression $\nabla\times B = 0$ is a little harder to understand when you imagine $\vec B$ as a flow field $\vec v$, but if you imagine that you insert a little pinwheel into the flow field at a point, the curl says something about how much torque there will be on that pinwheel due to the fluid trying to drag its tines, with the direction of the curl being the direction that makes the pinwheel spin the most. So $\nabla\times B=0$ means literally "this flow field does not make a pinwheel spin." That helps to understand the strange flow field $\vec v = \alpha ~ \hat \theta / r,$ which seems to "curl" around the origin but has $\nabla \times \vec v = 0$ at all points that are not the origin. If you think about a tiny little area $dr~r~d\theta,$ it has a slightly longer "outside edge" than its "inside edge" and so the flow along that outside edge drags the pinwheel slightly more; but if the flow drops like $1/r$ this slight lengthening is exactly compensated by the slight attenuation in the flow field, meaning that the pinwheel is not rotated at all.
Helmholtz's theorem says that any "nice" vector field $X$ can be "integrated" into a sum of two fields, a curl and a gradient: $$X = \nabla \times Y - \nabla \zeta.$$Since $\nabla \cdot X = -\nabla^2 \zeta$ and $\nabla \times X = \nabla \times (\nabla \times Y)$ due to standard vector identities, whenever we know that $\nabla \cdot X = 0$ we can solve this trivially with $\zeta = 0$ and we know that $X = \nabla \times Y$ for this "vector potential" $Y.$ Similarly when we know that $\nabla\times X = 0$ we know that $Y=0$ works and that $X = -\nabla \zeta$ for this "scalar potential" $\zeta.$
How this plays out in classical E&M
The full Maxwell Equations are (SI units) $$\begin{array}{ll}\nabla \cdot E = \rho/\epsilon_0&~~\nabla \times E = -\dot B\\ \nabla \cdot B = 0&~~\nabla \times B = \mu_0 J + \mu_0\epsilon_0 \dot E. \end{array}$$ Clearly we always have $B = \nabla \times A$ for some vector potential $\vec A$ by the third equation. More subtly, the second equation then works out to $\nabla \times (E + \dot A) = 0$ and therefore $E = -\dot A - \nabla \phi$ for some scalar potential $\phi.$ The choices that you make for these are not unique as you can add any $\nabla \psi$ to $A$ and still preserve $B,$ because the curl of a gradient is zero. If you work out the effect of this on $E$ you'll see that you have to also subtract $\dot \psi$ from $\phi$ to preserve $E$ as well. This is called the "gauge freedom" but it really reflects this freedom to choose your "constants of integration" in the Helmholtz decomposition.
The remaining equations work out to $$\begin{array}{rl}-\nabla\cdot \dot A -\nabla^2 \phi ~=& \rho/\epsilon_0\\ \nabla(\nabla \cdot A) - \nabla^2 A ~=& \mu_0 J - \mu_0\epsilon_0\ddot A - \mu_0\epsilon_0\nabla \dot \phi,\end{array} $$ once you use this identity that $\nabla \times (\nabla \times X) = \nabla(\nabla\cdot X) - \nabla^2 X.$ These can be put into a very elegant form by recognizing the wave operator $\square X = \mu_0\epsilon_0 \ddot X - \nabla^2 X$ and then incorporating the other spare terms in the second equation into a scalar field $\lambda = \mu_0\epsilon_0\dot \phi + \nabla\cdot A.$
Note that your gauge freedom maps $\lambda \mapsto \lambda - \square \psi$ so you can basically choose any functional form you want for $\lambda$ by solving a wave equation, in theory. Note also that $\lambda$ enters into the first equation via replacing $\nabla\cdot \dot A$ with $\dot \lambda - \mu_0 \epsilon_0 \ddot \phi,$ so you get $$\begin{array}{rl}\square \phi ~=& \rho/\epsilon_0 + \dot \lambda\\ \square A ~=& \mu_0 J - \nabla\lambda.\end{array}$$The "Lorentz gauge" sets $\lambda = 0$ because $(c\rho, J)$ is a 4-vector in relativity where $c = 1/\sqrt{\mu_0\epsilon_0}$ and hence this gives you an equation $\Box(\phi/c, \vec A) = \mu_0 (c\rho, \vec J),$ and since $\Box$ is relativistically covariant you get that $(\phi/c, \vec A)$ is also a 4-vector in relativity. If that doesn't make much sense to you just accept "hey, it makes these equations look exactly alike" and that's good enough.
The "Coulomb gauge" sets $\lambda = \mu_0\epsilon_0 \dot\phi$ so that the first equation reduces to just $-\nabla^2\phi = \rho/\epsilon_0,$ and we can basically assume that the scalar potential $\phi$ updates instantaneously to all of the charges, which makes it super-simple to calculate. If you look up at the definition of $\lambda$ you find out that it also sets $\nabla\cdot A = 0$ so $A$ can be integrated in terms of a vector potential potential, $A = \nabla \times W.$