First off there is no "formal difference" between a theorem and a lemma. Formally, if you view mathematics from the perspective of set theory (ZFC), you must conclude that anything commonly called a "lemma" in the literature is by definition "a theorem of ZFC," i.e. a finite sequence of true formulas of ZFC which flow logically from one formula to the next ending on a formula representing the statement of the theorem.
So, lemmas are invoked with literary freedom that it be understood that they really are theorems, but somehow "little ones". But why bother?
A lemma comes typically in two forms: (i) a useful trick or (ii) a technical step in a proof. Let me demonstrate some examples.
A useful trick in real analysis is called "Fatou's Lemma," which helps us interchange limit operations and integrals. Very roughly, it states that
"if $\displaystyle\lim_{n \rightarrow \infty} f_n(x) \rightarrow f(x)$ for all $x$, then
$$\int \lim f_n(x) dx = \int f(x) dx \leq \lim \displaystyle\int f_n(x) dx ,"$$
which, it turns out, becomes "half of the work" in proving a lot of very useful and frequently used inequalities like the Monotone Convergence Theorem and Lebesgue's Dominated Convergence Theorem. On its own, Fatou's Lemma is not so remarkable, and it quickly becomes a minor routine step in very major and fundamental theorems in real analysis -- this is why it is itself a lemma, not a theorem.
Another good example of a theorem of the (i) type is "Zorn's lemma". Zorn's lemma is a technical statement about partially ordered sets but it is invoked frequently in proofs studying ideals in ring theory (I'm sure it has many more uses but I'm unfamiliar with them).
The strange thing about Zorn's lemma is that it is logically equivalent to the Axiom of Choice, i.e. from Zorn's lemma you can prove the Axiom of Choice and from the Axiom of Choice you can prove Zorn's lemma. In other words, if you studied the axioms of set theory but instead of assuming the axiom of choice you assumed Zorn's Lemma as an axiom (let's call this Zorn's Axiom for now), then you could eventually deduce the Axiom of Choice (perhaps Lemma of Choice?) as a consequence of Zorn's Axiom. So Zorn's lemma is a lemma ONLY BECAUSE we assume the Axiom of Choice rather than Zorn's lemma as an axiom of standard set theory: it is a lemma only because of how we choose to organize mathematics.
A type (ii) lemma is something highly technical that, if proven in the middle of the theorem you really are trying to prove, you may have difficulty getting back on track since it takes too long. This happens ALL THE TIME in mathematics. Here is an example I came across recently from the proof of Dirichlet's theorem on arithmetic progressions in Tom Apostol's "Introduction to Analytic Number Theory":
Theorem (Dirichlet's Theorem): If $h$ and $k$ are relatively prime integers, then there are infinitely many primes in the arithmetic progression $\{hn+k \colon n = 1,2,3,\ldots\}$.
To prove this theorem, he proves a number of lemmas, such as
Lemma 7.4: If $x > 1$ we have
$$\displaystyle\sum_{p \leq x; p \equiv h (mod k)} \frac{\log p}{p} = \frac{1}{\phi(k)} \log x + \frac{1}{\phi(k)} \displaystyle\sum_{r=2}^{\phi(k)} \overline{\chi_r(h)} \displaystyle\sum_{p \leq x} \frac{\chi_r(p)\log p}{p} + \mathscr{O}(1),$$
and
Lemma 7.5 For $x > 1$ and $\chi \neq \chi_1$, we have
$$\displaystyle\sum_{p \leq x} \frac{\chi(p)\log p}{p} = -L_{\chi}'(1) \displaystyle\sum_{n \leq x} \frac{\mu(n)\chi(n)}{n} + \mathscr{O}(1),$$
and so forth. He has, in total, about 5 or 6 such lemmas which are steps in the proof of the theorem stated above. The reason these things, while complicated and substantial (far more than Fatou's lemma!), are called lemmas, is that if you began proving Dirichlet's Theorem and proved these in the middle of that proof, you would easily get lost.
So really, what a lemma is to you is whatever you want it to be. It is a word that exists in our vocabulary that is part of the proper name of a concept like Zorn's lemma or it can be simply a word to promote a more readable exposition.
Ad hoc never means anything much like a priori. An ad hoc method is one devised for the specific case at hand; as such it contrasts with a general method. It may be an impromptu method, one devised on the fly, though that’s less likely in for something being reported in writing.
A priori estimate has a specific technical meaning. In general, however, the term simply means something like based on theory rather than evidence or assumed without evidence, though the exact sense depends on context.
I’ve no problem with the use of common Latin expressions, and I don’t even think of a priori and ad hoc as particularly foreign: they’ve been rather thoroughly naturalized into the language. Phrases like ceteris paribus (‘all else being equal’) and mutatis mutandis (‘making the necessary changes’) are more problematic: they’re useful, but a significant fraction of most audiences probably won’t understand them.
Best Answer
Consider an integral of the sort $$ \int_{\mathbb R^n}K(x,y)f(y)\,dy $$ for some smooth function $f$ and a kernel $K$. To allow the integral to converge at infinity we'll assume that $f$ has compact support. Integrals of that sort appears in the integral equations to which, in particular, boundary value problems for PDE are reduced (not always in $\mathbb R^n$, but on surfaces too).
If $|K(x,y)|\sim |x-y|^{-\alpha}$ when $y\to x$, $\alpha>0$, the kernel is called singular, because it tends to infinity as $y\to x$. If $\alpha<n$ the kernel (and the integral) is called weakly singular, since the integrand is absolutely summable. For $\alpha>n$ a kernel is called hypersingular, because in this case the estimate on $|K|$ is not enough to guaranty convergence. Such integrals are often taken in the sense of the principal value. It can be done under some additional assumptions on $K$ such as $$ \int_{|x-y|=\varepsilon}K(x,y)\,ds_y=0, \quad \varepsilon>0. $$ For example, the values of the simple-layer potential for the Laplace equation (n=3) $$ \frac{1}{4\pi}\int_S\frac{f(y)}{|x-y|}ds_y, \quad x\in S, $$ is a weakly singular integral on a smooth bounded surface $S$ and the double layer potential $$ \frac{1}{4\pi}\int_Sf(y)\frac{\partial }{\partial n_y}\frac1{|x-y|}ds_y, \quad x\in S, \qquad (*) $$ is singular, but not weakly singular, because $\left|\frac{\partial }{\partial n_y}\frac1{|x-y|}\right|\sim|x-y|^{-2}$, $S$ is a 2-dimensional surface and $$ \frac{1}{4\pi}\int_S\left|\frac{\partial }{\partial n_y}\frac1{|x-y|}\right|ds_y=+\infty $$ for some points $x\in S$. Nevertheless for smooth $f$ the integral $(*)$ converges in the sense of the principal value.