I think the point that was confusing me/missing link was that spherical harmonics functions are the solution of the Laplace's differential equation:
$$\frac{\partial^2u}{\partial x^2}+\frac{\partial^2u}{\partial y^2}+\frac{\partial^2u}{\partial z^2}=0$$
Orthogonal means the functions "pull in different directions". Like in linear algebra, orthogonal vectors "pull" in completely "distinct" directions in n-space, it turns out that orthogonal functions "help you reach completely distinct values", where the resultant value (sum of functions) is again a function.
SH are based on the associated Legendre polynomials, (which are a tad more funky than Legendre polynomials, namely each band has more distinct functions defined for it for the associated ones.)
The Legendre polynomials themselves, like SH, are orthogonal functions. So if you take any 2 functions from the Legendre polynomial set, they're going to be orthogonal to each other (integral on $[-1,1]$ is $0$), and if you add scaled copies of one to the other, you're going to be able to reach an entirely distinct set of functions/values than you could with just one of those basis functions alone.
Now the sphere comes from the idea that, SH functions, use the Legendre polynomials (but Legendre polynomials are 1D functions), and the specification of spherical harmonics is a function value for every $\phi \theta$. There is no "sphere" per se.. it's like if you say "there is a value for every point on the unit circle", it means you trace a circle around the origin and give each point a value.
What is meant is every point on a unit sphere has a numeric value. If we associate a color to every point on the sphere, you get a visualization like this:
This page shows a visualization where the values of the SH function are used to MORPH THE SPHERE (which is part of what was confusing me earlier). But just because a function has values for every point on the sphere doesn't mean there is a sphere.
I think the most accessible argument for completeness of spherical harmonics is to start with the Weierstrass approximation theorem, which asserts that on compact subsets $E$ of $R^n$ polynomials are dense in _sup_norm_, meaning the metric on functions $d(f,g)=\sup_{x\in E} |f(x)-g(x)|$. This applies to the sphere $E=S^{n-1}$. At some point in the development (as in the corresponding chapter in Stein-Weiss, for example), it is shown that every homogeneous polynomial $f$ can be written as $f=f_d + r^2f_{d-2} + r^4f_{d-4}+\ldots$ where $f_i$ is harmonic, and $r$ is radius. Thus, restricted to the sphere, every polynomial is pointwise-equal to a harmonic polynomial. Combining these two points, harmonic polynomials are dense in continuous functions on the sphere, with respect to sup-norm.
Depending what definition of "integral" one uses, one shows that continuous functions on the sphere are dense in $L^2$, so harmonic polynomials are dense in $L^2$. Then it is a small exercise to infer completeness.
A somewhat more sophisticated viewpoint has us note that the sphere is a homogeneous space $S^{n-1}=SO(n)/O(n-1)$ of orthogonal groups, so (thinking in terms of Frobenius reciprocity) the representations of $SO(n)$ appear which restrict to the trivial repn on $O(n-1)$, with multiplicity equal to the dimension of $O(n-1)$-fixed vectors, which is found to be $1$ (or $0$). The completeness in this version of the story is part of the general decomposition of $L^2(SO(n))$ (under the compact (Hilbert-Schmidt) operators coming from compactly-supported continuous functions on the group), and, instead, the issue becomes identifying irreducibles as corresponding to spherical harmonics. This is probably best done by using the fact that every irreducible has a highest weight, and, conversely, isomorphism classes are uniquely determined by highest weights.
This fancier viewpoint applies more broadly.
Best Answer
You can literally define the spherical harmonics on a sphere:
In your article, the following formula is given: $$Y_l^m(\theta,\phi) = \sqrt{\frac{(2l+1)}{4\pi}\frac{(l-m)!}{(l+m)!}} P_l^m(\cos \theta) e^{im\phi}$$ Clearly, $Y(\theta,\phi)$ is well defined for all $(\theta,\phi)\in\mathbb R\times\mathbb R$. On the other hand\begin{align}F\colon[0,\pi]\times\mathbb R&\to S\\(\theta,\phi)&\mapsto\begin{pmatrix}\sin\theta\cos\phi\\\sin\theta\sin\phi\\\cos\theta\end{pmatrix}\end{align} is a surjective function to the sphere. Thus, if a spherical harmonic $Y$ is constant on the level sets of $F$, we can define $Y$ on the sphere by requiring that $$(Y\circ F)(\theta,\phi)=Y(\theta,\phi)$$for all $(\theta,\phi)\in[0,\pi]\times\mathbb R$ (note the slight abuse of notation).
Well, it turns out that spherical harmonics are indeed constant on level sets of $F$: For the north pole and the south pole, you can find the explanation here and for the other points on the sphere - i.e. the points with $0<\theta<\pi$ - this is easy to prove.