Kurtosis is really pretty simple ... and useful. It is simply a measure of outliers, or tails. It has nothing to do with the peak whatsoever - that definition must be abandoned.
Here is a data set:
0, 3, 4, 1, 2, 3, 0, 2, 1, 3, 2, 0, 2, 2, 3, 2, 5, 2, 3, 999
Notice that '999' is an outlier.
Here are the $z^4$ values from the data set:
0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00,0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 360.98
Notice that only the outlier gives a $z^4$ that is noticeably different from 0.
The average of these $z^4$ values is the kurtosis of the empirical distribution (subtract 3 if you like, it doesn't matter for the point I am making): 18.05
It should be obvious from this calculation that the data near the "peak" (the non-outlier data) contribute almost nothing to the kurtosis statistic.
Kurtosis is useful as a measure of outliers. Outliers are important to elementary students and therefore kurtosis should be taught. But kurtosis has virtually nothing to do with the peak, whether it is pointy, flat, bimodal or infinite. You can have all the above with small kurtosis and all of the above with large kurtosis. So it should NEVER be presented as having anything to do with the peak, because that will be teaching incorrect information. It also makes the material needless confusing, and seemingly less useful.
Summary:
- kurtosis is useful as a measures of tails (outliers).
- kurtosis has nothing to do with the peak.
- kurtosis is practically useful and should be taught, but only as a measure of outliers. Do not mention peak when teaching kurtosis.
This article explains clearly why the "Peakedness" definition is now officially dead.
Westfall, P.H. (2014). "Kurtosis as Peakedness, 1905 – 2014. R.I.P." The American Statistician, 68(3), 191–195.
The general form of the covariance depends on the first three moments of the distribution. To facilitate our analysis, we suppose that $X$ has mean $\mu$, variance $\sigma^2$ and skewness $\gamma$. The covariance of interest exists if $\gamma < \infty$ and does not exist otherwise. Using the relationship between the raw moments and the cumulants, you have the general expression:
$$\begin{equation} \begin{aligned}
\mathbb{C}(X,X^2)
&= \mathbb{E}(X^3) - \mathbb{E}(X) \mathbb{E}(X^2) \\[6pt]
&= ( \mu^3 + 3 \mu \sigma^2 + \gamma \sigma^3 ) - \mu ( \mu^2 + \sigma^2 ) \\[6pt]
&= 2 \mu \sigma^2 + \gamma \sigma^3. \\[6pt]
\end{aligned} \end{equation}$$
The special case for an unskewed distribution with zero mean (e.g., the centred normal distribution) occurs when $\mu = 0$ and $\gamma = 0$, which gives zero covariance. Note that the absence of covariance occurs for any unskewed centred distribution, though independence holds only for the normal distribution.
Best Answer
A discussion on the limits of the sample skewness and kurtosis is available here. The author gives proper references to the original proofs, and the cited results are: $$ |g_1| \le \frac{n-2}{\sqrt{n-1}} = \sqrt{n-1} - \frac{1}{\sqrt{n-1}} $$ $$ b_2 = g_2 + 3 \le \frac{n^2-3n+3}{n-1} = n -2 + \frac1{n-1} $$ So for $n=10$, you can't have skewness greater than 2.89, and excess kurtosis, greater than 5.11.