We don't get to choose here. The "normalizing" factor, in essence is a "variance-stabilizing to something finite" factor, so as for the expression not to go to zero or to infinity as sample size goes to infinity, but to maintain a distribution at the limit.
So it has to be whatever it has to be in each case. Of course it is interesting that in many cases it emerges that it has to be $\sqrt n$. (but see also @whuber's comment below).
A standard example where the normalizing factor has to be $n$, rather than $\sqrt n$ is when we have a model
$$y_t = \beta y_{t-1} + u_t, \;\; y_0 = 0,\; t=1,...,T$$
with $u_t$ white noise, and we estimate the unknown $\beta$ by Ordinary Least Squares.
If it so happens that the true value of the coefficient is $|\beta|<1$, then the the OLS estimator is consistent and converges at the usual $\sqrt n$ rate.
But if instead the true value is $\beta=1$ (i.e we have in reality a pure random walk), then the OLS estimator is consistent but will converge "faster", at rate $n$ (this is sometimes called a "superconsistent" estimator -since, I guess, so many estimators converge at rate $\sqrt n$).
In this case, to obtain its (non-normal) asymptotic distribution, we have to scale $(\hat \beta - \beta)$ by $n$ (if we scale only by $\sqrt n$ the expression will go to zero). Hamilton ch 17 has the details.
There are several instances of (2), namely the case where the variance of a UMVU estimator exceeds the Cramer-Rao lower bound. Here are some common examples:
- Estimation of $e^{-\theta}$ when $X_1,\ldots,X_n$ are i.i.d $\mathsf{Poisson}(\theta)$:
Consider the case $n=1$ separately. Here we are to estimate the parametric function $e^{-\theta}=\delta$ (say) based on $X\sim\mathsf{Poisson}(\theta) $.
Suppose $T(X)$ is unbiased for $\delta$.
Therefore, $$E_{\theta}[T(X)]=\delta\quad,\forall\,\theta$$
Or, $$\sum_{j=0}^\infty T(j)\frac{\delta(\ln (\frac{1}{\delta}))^j}{j!}=\delta\quad,\forall\,\theta$$
That is, $$T(0)\delta+T(1)\delta\cdot\ln\left(\frac{1}{\delta}\right)+\cdots=\delta\quad,\forall\,\theta$$
So we have the unique unbiased estimator (hence also UMVUE) of $\delta(\theta)$:
$$T(X)=\begin{cases}1&,\text{ if }X=0 \\ 0&,\text{ otherwise }\end{cases}$$
Clearly,
\begin{align}
\operatorname{Var}_{\theta}(T(X))&=P_{\theta}(X=0)(1-P_{\theta}(X=0))
\\&=e^{-\theta}(1-e^{-\theta})
\end{align}
The Cramer-Rao bound for $\delta$ is $$\text{CRLB}(\delta)=\frac{\left(\frac{d}{d\theta}\delta(\theta)\right)^2}{I(\theta)}\,,$$
where $I(\theta)=E_{\theta}\left[\frac{\partial}{\partial\theta}\ln f_{\theta}(X)\right]^2=\frac1{\theta}$ is the Fisher information, $f_{\theta}$ being the pmf of $X$.
This eventually reduces to $$\text{CRLB}(\delta)=\theta e^{-2\theta}$$
Now take the ratio of variance of $T$ and the Cramer-Rao bound:
\begin{align}
\frac{\operatorname{Var}_{\theta}(T(X))}{\text{CRLB}(\delta)}&=\frac{e^{-\theta}(1-e^{-\theta})}{\theta e^{-2\theta}}
\\&=\frac{e^{\theta}-1}{\theta}
\\&=\frac{1}{\theta}\left[\left(1+\theta+\frac{\theta^2}{2}+\cdots\right)-1\right]
\\&=1+\frac{\theta}{2}+\cdots
\\&>1
\end{align}
With exactly same calculation this conclusion holds here if there is a sample of $n$ observations with $n>1$. In this case the UMVUE of $\delta$ is $\left(1-\frac1n\right)^{\sum_{i=1}^n X_i}$ with variance $e^{-2\theta}(e^{\theta/n}-1)$.
- Estimation of $\theta$ when $X_1,\ldots,X_n$ ( $n>1$) are i.i.d $\mathsf{Exp}$ with mean $1/\theta$:
Here UMVUE of $\theta$ is $\hat\theta=\frac{n-1}{\sum_{i=1}^n X_i}$, as shown here.
Using the Gamma distribution of $\sum\limits_{i=1}^n X_i$, a straightforward calculation shows $$\operatorname{Var}_{\theta}(\hat\theta)=\frac{\theta^2}{n-2}>\frac{\theta^2}{n}=\text{CRLB}(\theta)\quad,\,n>2$$
Since several distributions can be transformed to this exponential distribution, this example in fact generates many more examples.
- Estimation of $\theta^2$ when $X_1,\ldots,X_n$ are i.i.d $N(\theta,1)$:
The UMVUE of $\theta^2$ is $\overline X^2-\frac1n$ where $\overline X$ is sample mean. Among other drawbacks, this estimator can be shown to be not attaining the lower bound. See page 4 of this note for details.
Best Answer
Convergence of a sequence of random variables in probability does not imply convergence of their variances, nor even that their variances get anywhere near $0.$ In fact, their means may converge to a constant yet their variances can still diverge.
Examples and counterexamples
Construct counterexamples by creating ever more rare events that are increasingly far from the mean: the squared distance from the mean can overwhelm the decreasing probability and cause the variance to do anything (as I will proceed to show).
For instance, scale a Bernoulli$(1/n)$ variate by $n^{p}$ for some power $p$ to be determined. That is, define the sequence of random variables $X_n$ by
$$\begin{aligned} &\Pr(X_n=n^{p})=1/n \\ &\Pr(X_n=0)= 1 - 1/n. \end{aligned}$$
As $n\to \infty$, because $\Pr(X_n=0)\to 1$ this converges in probability to $0;$ its expectation $n^{p-1}$ even converges to $0$ provided $p\lt 1;$ but for $p\gt 1/2$ its variance $n^{2p-1}(1-1/n)$ diverges.
Comments
Many other behaviors are possible:
Because negative powers $2p-1$ of $n$ converge to $0,$ the variance converges to $0$ for $p\lt 1/2:$ the variables "squeeze down" to $0$ in some sense.
An interesting edge case is $p=1/2,$ for which the variance converges to $1.$
By varying $p$ above and below $1/2$ depending on $n$ you can even make the variance not converge at all. For instance, let $p(n)=0$ for even $n$ and $p(n)=1$ for odd $n.$
A direct connection with estimation
Finally, a reasonable possible objection is that abstract sequences of random variables are not really "estimators" of anything. But they can nevertheless be involved in estimation. For instance, let $t_n$ be a sequence of statistics, intended to estimate some numerical property $\theta(F)$ of the common distribution of an (arbitrarily large) iid random sample $(Y_1,Y_2,\ldots,Y_n,\ldots)$ of $F.$ This induces a sequence of random variables
$$T_n = t_n(Y_1,Y_2,\ldots,Y_n).$$
Modify this sequence by choosing any value of $p$ (as above) you like and set
$$T^\prime_n = T_n + (X_n - n^{p-1}).$$
The parenthesized term makes a zero-mean adjustment to $T_n,$ so that if $T_n$ is a reasonable estimator of $\theta(F),$ then so is $T^\prime_n.$ (With some imagination we can conceive of situations where $T_n^\prime$ could yield better estimates than $T_n$ with probability close to $1.$) However, if you make the $X_n$ independent of $Y_1,\ldots, Y_n,$ the variance of $T^\prime_n$ will be the sum of the variances of $T_n$ and $X_n,$ which you thereby can cause to diverge.