Finding the variance of a function

calculusdefinite integralsintegrationpythonstatistics

In an attempt to find the variance of a function continuous in the interval $S = [a,b]$.

I have proposed a solution. But am afraid that I have something wrong over here. And before any further explanation I need to clarify that I am currently in high school year 11 so please consider my range of knowledge! without any further a do here is my attempt.


For any continuous function $f(x)$ we can define
$$
\tag{*}
M\{f\}(a, b) = \frac{1}{b-a} \int_{a}^{b}f(x)\, dx.
$$

From here we will use a theorem that states:
$$
\tag{**}
\sigma^2 = (\frac{1}{N}\sum_{i}^N x_i^2) – x_{avg}^2.
$$

Here note that $\frac{1}{N}\sum_{i}^N x_i^2$ is just the average of the set $S$ where $S = \{x_i|i\in[1, N]\land i\in N\}$ now if we need to find the average of some function $f$ from the theorem $(*)$ we can find it. So let's plug all of that in:
$$
\tag{***}
(*)(**) \implies \sigma^2\{f\}(a, b) = M\{x^2\}(a, b)-[M\{f\}(a, b)]^2 = \frac{1}{b-a} \int_{a}^{b}x^2 dx – [\frac{1}{b-a} \int_{a}^{b}f(x) dx]^2.
$$

Now when I input the function $\sigma^2\{\sin x\}(-3, 3)$ wolfram alpha outputs 3 but on the other hand I have written a code which divides an interval into pieces and computes the value of the function at each slicing point and then it computes the variance of the collected data which it claims to be about $0.523254$ here is the code:

def mean(data):
    return sum(data) / len(data)

def variance(data):
    mean_flt = mean(data)
    return sum([(i ** 2) / len(data) for i in data]) - mean_flt ** 2

def split_interval(interval, section):
    step = (interval[1] - interval[0]) / section
    array = []
    i = interval[0]
    while i <= interval[-1]:
        array.append(i)
        i += step

    return array[:]

function = lambda x : math.sin(x)
dataset = [function(i) for i in split_interval([-3, 3], 100)]
var = variance(dataset)

print("var-rep-100 : ", var)

please help me see my mistakes whether it is in the code or it is in the math!
Thank you in advance!

Best Answer

As stated in the comments, it should be $\int_a^b [f(x)]^2 dx$ instead of $\int_a^b x^2 dx$. So your formula for variance should be $\sigma^2 = \frac{1}{b - a}\int_a^b \sin^2(x) dx - \frac{1}{b - a}\int_a^b \sin(x) dx$. Notice that it's not $\sin(x^2)$ either since your random variable is $Y = \sin(X)$. See this for example.

Indeed, implementing this in sage matches your results for Python:

sage: x = var('x')
sage: f = sin(x)
sage: a = -3
sage: b = 3
sage: (integrate(f(x)^2, x, a, b) / (b - a) - (integrate(f(x), x, a, b) / (b - a))^2).n()
# 0.523284624849910

Exact result is $\frac{1}{6}\int_{-3}^3 \sin^2(x) dx = \frac{1}{2} - \frac{1}{12}\sin(6)$.

Another Python implementation:

import numpy as np # for .var (variance)

x = np.linspace(-3, 3, 1000000)
y = np.sin(x)
print(np.var(y))
# 0.5232841214798628
```