It is not the correlation length of the system that you should look at, but the correlation of the fluctuations. If T>>Tc the spins are randomly oriented and the lenghtscale of fluctuations is very small. As you get closer to Tc, the fluctuations become more correlated, and lenghtscale increases toward infinity. Similarly for the ferromagnet at temperatures much less than Tc, all spins are aligned. The fluctuations at 0 < T << Tc have short correlation lengths. As you heat the system, it is still mostly ordered, but the number of spins pointing in the opposite direction increases, and so does the correlation length of these fluctuations
First note that, as you say, the 2-point function $\langle \sigma_i \sigma_j\rangle$ does not tend to zero as $\|j-i\|\to\infty$ when $T<T_{\rm c}$; namely,
$$
\lim_{\|j-i\|\to\infty} \langle \sigma_i \sigma_j\rangle^+ = (m^*(T))^2,
$$
where $m^*(T)=\langle\sigma_0\rangle^+$ is the spontaneous magnetization (the $+$ superscript indicates that the expectation is taken in the $+$ state).
So, when one says that correlations decay exponentially when $T<T_{\rm c}$, one is actually speaking of the truncated correlation functions. The truncated 2-point function is defined as
$$
\langle \sigma_i; \sigma_j\rangle^+ =
\bigl\langle (\sigma_i - \langle \sigma_i\rangle^+) (\sigma_j - \langle \sigma_j\rangle^+) \bigr\rangle^+ =
\langle \sigma_i \sigma_j\rangle^+ - \langle \sigma_i\rangle^+ \langle \sigma_j\rangle^+.
$$
It measures how correlated the fluctuations at $i$ and $j$ are. (In probabilistic terms, this is simply the covariance between the random variables $\sigma_i$ and $\sigma_j$.)
The truncated 2-point function converges to $0$ at all temperatures $T$. This is a general fact, true in the pure phases of any models, at any temperature.
Now, turning to your question, the truncated correlations indeed decay exponentially fast in the 2d Ising model for all $T\neq T_{\rm c}$.
When $T<T_{\rm c}$, you should think about it as follows: at low temperatures, spins typically take the same values (say $+1$ in the $+$ phase), with only rare fluctuations. One useful way to look at these fluctuations is by taking a geometrical viewpoint on the configurations: draw a unit length segment separating each pair of nearest-neighbor vertices at which the spins take opposite values. The union of these segments form the Peierls contours of the configuration.
It is easy to check that the energetic cost associated to each such contour is proportional to its length. Note also, that the contours provide a complete description of the configuration (if you know that you are in the $+$ phase).
Now, what has this to do with the exponential decay of correlations? As mentioned above, the truncated 2-point function $\langle \sigma_i; \sigma_j\rangle^+$ measures how correlated the fluctuations at $i$ and $j$ are. What is the event that will lead to a simultaneous flip of both the spin at $i$ and $j$? It is not difficult to convince oneself that this should occur when a large contour surrounds simultaneously the vertices $i$ and $j$. Namely, using correlation inequalities, one can check that
$$
0\leq \langle \sigma_i; \sigma_j\rangle^+ \leq {\rm Prob} (\text{there exists a contour surrounding both $i$ and $j$}),
$$
where the $+$ superscript indicates that the computation is done in the $+$ phase. But the event in the right-hand side has a probability that is exponentially small in the distance $\|j-i\|$.
All this can be made rigorous. At sufficiently low temperatures, you can use cluster expansion techniques (this works in any dimension $d\geq 2$). You can find a detailed argument, for example, in Theorem 5.27 in this book. In dimension $2$, this can also be proved non-perturbatively, either through explicit computations, or by relating the cost of a large contour to the surface tension in the model; see, for example this paper.
As a final remark : the above argument suggests, and you can make this rigorous (see for example the cluster expansion proof I referred to above) that the correlation length tends to zero when $T\downarrow 0$. This is because it becomes extremely unlikely to have a contour surrounding two distant vertices and fluctuations at very low temperatures become essentially purely local events, thus occurring (approximately) independently at different vertices.
Best Answer
The exponent is not in general an integer, and therefore the divergence is not really polynomial. That being said, let's agree to call a structure of the form $x^a$ a polynomial, for any (real) $a$.
The general case.
In general terms, you cannot really prove that the divergence is always polynomial. If $f(x)$ has a singularity at $x_0$, you may define $$ k\equiv-\lim_{x\to x_0}\frac{\log|f(x)|}{\log |x-x_0|} $$
If $k$ is finite, then the singularity is of the form $$ f(x)\sim (x-x_0)^{-k} $$ which proves, a posteriori, that the divergence is indeed polynomial.
It may very well happen that $k$ does not exist. Some examples are $$ \begin{aligned} f(x)&=\mathrm e^{-1/(x-x_0)^a}\\ f(x)&=a\log|x-x_0| \end{aligned} $$ where $k=-\infty$ and $k=0$ respectively. In neither of these cases is the divergence polynomial.
Of course, these examples of $f$ are rather unphysical. In general, you have several physical principles that allow you to restrict the possible functions $f$, and you may sometimes even succeed in proving that $k$ is actually finite (cf. a CFT below). In general terms, it is better to say that the critical exponent $k$ has been observed to be finite for a wide class of theories, and this is confirmed by explicit calculation (numerical or otherwise) in many well-studied examples.
The case of a CFT.
If $\phi_1,\phi_2$ is a pair of primary fields of weight $\Delta_1,\Delta_2$, then conformal invariance implies that the corresponding correlation function is given by $$ \langle\phi_1(x)\phi_2(0)\rangle=\frac{a}{|x_1-x_2|^{\Delta_1+\Delta_2}} $$ in which case the divergence is indeed polynomial. For a proof of this formula, see e.g. 1511.04074, §2.6.
A general QFT, at a critical point, is described by a CFT, and therefore this result is valid for essentially every healthy QFT.