it is possible to show that $\mathrm {E}d(P,\hat{P_n})\sim \frac{C}{\sqrt{n}}$ and specify the value of $C$.
let
$$D_n^2 =\sum_{x \in \mathcal{X}} \left( \sqrt{P(x)} - \sqrt{\hat{P_n}(x)} \right)^2 = 2d^2(P,\hat{P_n}). $$
$4nD_n^2$ is known in statistics [for reasons unclear to me] as the freeman-tukey goodness-of-fit [gof] statistic for testing the null hypothesis that $X\sim P$. like the better known pearson chi-squared gof statistic, it also has [under the null hypothesis] an asymptotic chi-squared distribution with $k-1$ df. here $k=|\mathcal{X}|$.
the statistic $D_n^2$ seems to have been first considered by Matusita in On the Estimation by the Minimum Distance Method. in Decision Rules, Based on the Distance, for Problems of Fit, Two Samples, and Estimation, Matusita develops some asymptotic [and other] properties of $D_n^2$, including the fact that under the null hypothesis, as $n\to\infty$,
$$\kern-1.9in (1)\kern1.9in 4nD_n^2\ \buildrel{\mathcal L}\over{\to}\ \chi^2_{k-1}.$$
it is also shown there that
$$\kern-.88in (2)\kern.88in 4nD_n^2\ \le\ \mathbb{X}^2_n\ :=\ n\sum_{x \in \mathcal{X}} \frac{\left({\hat P}(x)-P(x)\right)^2}{P(x)}. $$
$\mathbb{X}^2_n$ is, of course, the pearson chi-squared gof statistic, and it is well-known that under the null hypothesis $X\sim P$, as $n\to\infty$,
$$\kern-2in (3)\kern2in \mathbb{X}^2_n\ \buildrel{\mathcal L}\over{\to}\ \chi^2_{k-1}.$$
it is also easily seen that for all $n\ge 1,\ \mathrm {E} \mathbb{X}^2_n\ =\ k-1$.
together with (3) [and non-negativity], this entails that $\mathbb{X}^2_n$ is uniformly integrable. in view of (2), so is $4nD_n^2$, so it follows from (1) that
$$\mathrm {E}4nD_n^2\to \mathrm {E}\chi^2_{k-1}\ =\ k-1\ \mathrm{as}\ n\to\infty$$
and
$$\mathrm {E}2\sqrt{n}D_n\to \mathrm {E}\chi_{k-1}\ \mathrm{as}\ n\to\infty.$$
[for more details on connections between convergence in law, uniform integrability and convergence of expectations, see Billingsley 1st ed, p32 theorem 5.4 or Billingsley 2nd ed pp31-32 theorems 3.4 and 3.5.]
I have something that may or may not be useful...
Diaconis notes an interpretation of variation distance of Paul Switzer. Consider $\mu$, $\nu\in M_p(S)$. Given a single observation of $S$, sampled from $\mu$ or $\nu$ with probability $1/2$, guess whether the observation, $o$, was sampled from $\mu$ or $\nu$. The classical strategy presented here gives the probability of being correct as $1/2(1+\|\mu-\nu\|)$:
- Evaluate $\mu(o)$ and $\nu(o)$.
- If $\mu(o)\geq\nu(o)$, choose $\mu$.
- If $\nu(o)>\mu(o)$, choose $\nu$.
To see this is true, let $\{\mu>\nu\}$ be the set $\{t\in S:\mu(t)>\nu(t)\}$.
Suppose $o$ is sampled from $\mu$. Then the strategy is correct if $o\in\{\mu=\nu\}$ or $o\in\{\mu>\nu\}$:
$$\mathbb{P}[\text{guessing correctly}\,|\,\mu]=\mathbb{P}[o\in\{\mu=\nu\}\,|\,\mu]+\mathbb{P}[o\in\{\mu>\nu\}\,|\,\mu]$$
with a similar expression for $\mathbb{P}[\text{guessing correctly}\,|\,\nu]$.
Note that $\mathbb{P}[o\in\{\mu=\nu\}]=\mu(\{\mu=\nu\})=\nu(\{\mu=\nu\})$ and also $\mathbb{P}[o\in\{\mu>\nu\}\,|\,\mu]=\mu(\{\mu>\nu\})$ (and similar for $o\in\{\mu<\nu\}$). Thus
\begin{align*}
\mathbb{P}[\text{guessing correctly}] &=\frac12\mathbb{P}[\text{guessing correctly}\,|\,\mu]+\frac12\mathbb{P}[\text{guessing correctly}\,|\,\nu]
\\&=\frac12\left(\nu(\{\mu=\nu\})+\mu(\{\mu>\nu\})\right)+\frac12\left(\nu(\{\mu<\nu\})\right)
\end{align*}
It is easily shown that
$$\|\mu-\nu\|=\mu\left(\{\mu>\nu\}\right)-\nu\left(\{\mu>\nu\}\right).$$
Hence
$$
\mathbb{P}[\text{guessing correctly}]=\frac12\left(\underbrace{\nu(\{\mu=\nu\})+\nu(\{\mu>\nu\})+\nu(\{\mu<\nu\})}_{=1}+\|\mu-\nu\|)\right).$$
Best Answer
See, e.g., section 2.1 of http://arxiv.org/pdf/1209.1077.pdf; also http://dx.doi.org/10.1007/BF02213456, which is cited in Villani's book as having particularly precise estimates.