Indeed, that normalizing constant is often presented as though it were a magical insight. In fact, the $2\pi$ can be inserted in various fashions into "Fourier transform" or "cosine transform" on even functions. Once we discover that a normalizing constant must occur, and what it is, the only question is where to put it.
To discover the necessary constant when we have fixed other choices (for example, $\cos(xy)$ rather than $\cos(2\pi xy)$), apply a non-normalized transform and inverse transform to something whose transform is easy to understand, such as a Gaussian $f(x)=e^{-x^2}$, and discover the constant needed to make things match up.
A slightly new issue arises, namely, that if we (reasonably-enough, as a heuristic) interchange the order of integration, we seem to find that
$\int_{-\infty}^\infty \cos(mx)\,\cos(nx)\;dx$ is some multiple of Dirac's $\delta$. This is entirely correct, if interpreted properly, e.g., not as a numerical integral, but "integral/cosine transform" in an extended sense (e.g., on tempered distributions, extended by continuity).
Again, the $2\pi$ is an artifact that must appear somewhere in these formulas, as we discover. No, it is completely not obvious that your (extended-sense) integral for $\delta$ needs the $2\pi$ to be correct. That is really only discovered by looking at Fourier inversion for nicer functions (e.g., Schwartz functions).
The Fourier basis functions $e^{i \omega x}$ are eigenfunctions of the shift operator $S_h$ that maps a function $f(x)$ to the function $f(x - h)$:
$$
e^{i \omega (x-h)} = e^{-i\omega h} e^{i \omega x}
$$
for all $x \in \mathbb R$.
All of the incarnations of the Fourier transform (such as Fourier series and the discrete Fourier transform) can be understood as changing basis to a basis of eigenvectors for a shift operator.
It is possible to consider other operators, which have different eigenfunctions leading to different transforms. But this shift operator is so simple and fundamental that it's not surprising the Fourier transform turns out to be particularly useful.
Best Answer
As was said in comments, cosines and sines differ only by phase shift. The difference in their performance arises from their boundary behavior. On an interval $[0,\ell]$, the sine system $\left\{\sin \frac{2\pi n }{\ell}\right\}$ satisfies the Dirichlet boundary condition, attaining zero value at $0,\ell$.
A generic function, e.g., one describing the brightness of an image, need not take zero values on the boundary (or satisfy any other boundary condition we may be thinking of). Thus, it cannot be uniformly approximate by a linear combination of sines.
On the other hand, the cosine system $\left\{\cos \frac{2\pi n }{\ell}\right\}$ satisfies the Neumann boundary condition, having zero derivative at $0,\ell$. The boundary values are not pinned down as they are for sines. This makes it possible to uniformly approximate (reasonable*) continuous functions by a cosine Fourier series; the fact that we can't uniformly approximate the derivative is not nearly as damaging.
For illustration, here is $e^x$ approximated by sines and by cosines (after the interval shift from $[-1,1]$ to $[0,2]$), with the same number of terms of Fourier series used (taken from my blog):
Sines
Cosines
* It's hard to construct a continuous function for which the cosine Fourier series fails to converge to it uniformly on the interval of approximation. It's safe to say that a "naturally occurring" continuous function won't be such a counterexample.