Is the sum of binomial coefficients over square free integers normally distributed

binomial-coefficientsnormal distributionnumber theorystatisticssummation

I observed experimentally that the sum of binomial coefficients over square free integers approximately fits a normal distribution. Can this be proved or disproved theoretically?

Let $\mu(r)$ be the Mobius function. Define

$$
A_n =
\mu(1){n\choose 1} + \mu(2){n\choose 2} + \mu(3){n\choose 3} + \cdots + \mu(n){n\choose n}
$$

$$
B_n =
\mu(1)^2{n\choose 1} + \mu(2)^2{n\choose 2} + \mu(3)^2{n\choose 3} + \cdots + \mu(n)^2{n\choose n}
$$

Note that $B_n$ is nothing but the sum of the Binomial coefficients over square free integers.

Claim 1: The sequence of numbers $\dfrac{A_n}{2^n}$ is normally distributed with a mean $0$.

Claim 2: The sequence of numbers $\dfrac{\zeta(2)B_n}{2^n}$ is normally distributed with a mean $1$.

I do not have a closed form for the standard deviation in terms of well known constants and functions. As a illustration, given below is the histogram for $\frac{\zeta(2)s_n}{2^n}$. The blue dots are the actual distribution while the red line represents a perfect normal distribution with the parameters $a,b$ and $c$ given below.

enter image description here

Note that a similar sum over squares (instead of square free integers) appears to be arc-sine distributed instead of normal. So normality does not appear to be trivial.

Update: Normality tests done for $n \le 10^5$ and the observation is that as increases, the distribution fits a normal distribution better

Best Answer

This is an extended comment rather than a complete answer.

I understand that you want to find the limiting distribution. Below are the results for a maximum $n$ of 10,000 (along with the associated Mathematica code):

(* Generate data and moments *)
nMax = 10000;
\[Mu] = Table[MoebiusMu[i]^2, {i, nMax}];
s[n_] := Zeta[2] Sum[MoebiusMu[i]^2 Binomial[n, i]/2^n, {i, nMax}]
data = Table[{n, s[n]}, {n, 1, nMax}];
moments = Table[{n, Mean[data[[Range[n], 2]] // N],
    StandardDeviation[data[[Range[n], 2]] // N],
    Skewness[data[[Range[n], 2]] // N],
    Kurtosis[data[[Range[n], 2]] // N]}, {n, 2, nMax}];

I've generated the mean, standard deviation, skewness, and kurtosis values for $n=2$ through $n=10000$. If the limiting (or approximating distribution function) is normal, then the skewness should settle towards zero and the kurtosis settle towards 3. Here are the resulting figures:

ListPlot[{data, {{1, 1}, {nMax, 1}}}, Joined -> True, 
 AspectRatio -> 1/4,
 ImageSize -> 1000, Frame -> True, 
 FrameLabel -> (Style[#, Bold, 18] &) /@ {"n", 
    "\[Zeta](2)s(n)/\!\(\*SuperscriptBox[\(2\), \(n\)]\)"},
 PlotStyle -> Thickness[0.005], ImagePadding -> 50, PlotRange -> All]
plotIt[m_, label_, level_] := 
 ListPlot[{moments[[All, {1, m}]], {{2, level}, {nMax, level}}},
  Joined -> True, PlotRange -> All, Frame -> True, 
  FrameLabel -> (Style[#, Bold, 18] &) /@ {"n", label},
  AspectRatio -> 1/4, PlotStyle -> Thickness[0.005], 
  ImagePadding -> 50, PlotRange -> All, ImageSize -> 1000]
plotIt[2, "Mean", 1]
plotIt[3, "Standard deviation", 0.01078]
plotIt[4, "Skewness", 0]
plotIt[5, "Kurtosis", 3]

Data values Mean values Standard deviation values Skewness values Kurtosis values

While the above figures don't rule out a normal distribution (or that a normal distribution might provide a reasonable approximation for the proportion of numbers between any two specified values), that the skewness does not seem to be approaching zero and that the kurtosis is drifting farther away from 3 does not support a normal distribution as the limiting distribution. Maybe a slightly skewed and heavier-tailed distribution might be a better candidate for the limiting distribution.

From other posts I get the impression that you have values up to $n=44,000$. Similar figures as above might also be suggestive with that larger data set.