Convergence of a sequence of random variables in probability does not imply convergence of their variances, nor even that their variances get anywhere near $0.$ In fact, their means may converge to a constant yet their variances can still diverge.
Examples and counterexamples
Construct counterexamples by creating ever more rare events that are increasingly far from the mean: the squared distance from the mean can overwhelm the decreasing probability and cause the variance to do anything (as I will proceed to show).
For instance, scale a Bernoulli$(1/n)$ variate by $n^{p}$ for some power $p$ to be determined. That is, define the sequence of random variables $X_n$ by
$$\begin{aligned}
&\Pr(X_n=n^{p})=1/n \\
&\Pr(X_n=0)= 1 - 1/n.
\end{aligned}$$
As $n\to \infty$, because $\Pr(X_n=0)\to 1$ this converges in probability to $0;$ its expectation $n^{p-1}$ even converges to $0$ provided $p\lt 1;$ but for $p\gt 1/2$ its variance $n^{2p-1}(1-1/n)$ diverges.
Comments
Many other behaviors are possible:
Because negative powers $2p-1$ of $n$ converge to $0,$ the variance
converges to $0$ for $p\lt 1/2:$ the variables "squeeze down" to $0$
in some sense.
An interesting edge case is $p=1/2,$ for which the variance converges
to $1.$
By varying $p$ above and below $1/2$ depending on $n$ you can even
make the variance not converge at all. For instance, let $p(n)=0$
for even $n$ and $p(n)=1$ for odd $n.$
A direct connection with estimation
Finally, a reasonable possible objection is that abstract sequences of random variables are not really "estimators" of anything. But they can nevertheless be involved in estimation. For instance, let $t_n$ be a sequence of statistics, intended to estimate some numerical property $\theta(F)$ of the common distribution of an (arbitrarily large) iid random sample $(Y_1,Y_2,\ldots,Y_n,\ldots)$ of $F.$ This induces a sequence of random variables
$$T_n = t_n(Y_1,Y_2,\ldots,Y_n).$$
Modify this sequence by choosing any value of $p$ (as above) you like and set
$$T^\prime_n = T_n + (X_n - n^{p-1}).$$
The parenthesized term makes a zero-mean adjustment to $T_n,$ so that if $T_n$ is a reasonable estimator of $\theta(F),$ then so is $T^\prime_n.$ (With some imagination we can conceive of situations where $T_n^\prime$ could yield better estimates than $T_n$ with probability close to $1.$) However, if you make the $X_n$ independent of $Y_1,\ldots, Y_n,$ the variance of $T^\prime_n$ will be the sum of the variances of $T_n$ and $X_n,$ which you thereby can cause to diverge.
Efficiency is a "per se" concept in the sense that it is a measure of how variable (and biased) the estimator is from the "true" parameter. There is an actual numeric value for efficiency associated with a given estimator at a given sample-size for a given loss function. This actual number is related to the estimator AND the sample-size AND the loss function.
Asymptotic efficiency looks at how efficient the estimator is as the sample size increases. More important is how rapidly the estimator becomes efficient but this can be more difficult to determine.
Relative efficiency looks at how efficient the estimator is relative to an alternative estimator (typically at a GIVEN sample-size).
Efficiency requires the specification of some loss function. Originally, this was variance when only unbiased estimators were considered. These days, this is most often MSE (mean-squared-error which accounts for bias and variability). Other loss-functions can be used. The classical Cramer-Rao bound was for unbiased estimators only but was extended to many of these other loss functions (most especially for MSE loss).
An important adjunct concept is admissibility and domination of estimators.
The Wikipedia entry has many links.
Best Answer
We don't get to choose here. The "normalizing" factor, in essence is a "variance-stabilizing to something finite" factor, so as for the expression not to go to zero or to infinity as sample size goes to infinity, but to maintain a distribution at the limit.
So it has to be whatever it has to be in each case. Of course it is interesting that in many cases it emerges that it has to be $\sqrt n$. (but see also @whuber's comment below).
A standard example where the normalizing factor has to be $n$, rather than $\sqrt n$ is when we have a model
$$y_t = \beta y_{t-1} + u_t, \;\; y_0 = 0,\; t=1,...,T$$
with $u_t$ white noise, and we estimate the unknown $\beta$ by Ordinary Least Squares.
If it so happens that the true value of the coefficient is $|\beta|<1$, then the the OLS estimator is consistent and converges at the usual $\sqrt n$ rate.
But if instead the true value is $\beta=1$ (i.e we have in reality a pure random walk), then the OLS estimator is consistent but will converge "faster", at rate $n$ (this is sometimes called a "superconsistent" estimator -since, I guess, so many estimators converge at rate $\sqrt n$).
In this case, to obtain its (non-normal) asymptotic distribution, we have to scale $(\hat \beta - \beta)$ by $n$ (if we scale only by $\sqrt n$ the expression will go to zero). Hamilton ch 17 has the details.