Solved – Statistical test for comparing two frequency distributions expressed as arrays (buckets) of values

chi-squared-testdistributionsrstatistical significance

I am looking for an appropriate statistical test that will compare two frequency distributions, where the data is in the form of two arrays (or buckets) of values.

For example, suppose I have two distributions, where A, B, and C are observed outcomes from a software logging system (such as whether customers clicked on button A, B, or C).

HISTORICAL: 
A        B        C
122319   295701   101195

ONE MONTH:
A        B        C
1734     3925     1823

My goal is to create an automated A/B testing system. For example, we've collected this data for the last 6 months (in the HISTORICAL data set). After we roll out a new algorithm, we can collect new results (in the ONE MONTH data set). If the two distributions are "significantly" different, we'd then know to take some action.

My specific questions:

What's the proper statistical test for this problem, and how could I know when these distributions differ significantly? An answer using R or python would be appreciated.
What's the minimum number of samples I'd need for both HISTORICAL and ONE MONTH for the test to be valid?

I've read several other questions related to chi-squared and Kolmogorov-Smirnov tests but don't know where to begin. Related questions:

Thank you for any help.

Best Answer

Run a chi-squared goodness-of-fit test to determine if an observed frequency distribution observed differs from a desired (perhaps theoretical) distribution expected.

Note carefully the definition of the statistic $X^2$ (the eponymous chi squared):

$$X^2 = \sum_{i}^{}{ \frac{(observed_i - expected_i)^2}{expected_i} }$$

Both series should be of the same order, so one of them needs to be scaled to the other. One can scale expected to observed.

Below is some Python code that encapsulates this test. To make the final evaluation, a decision is made against the test's resulting p-value.

#!/usr/bin/env python 
import numpy as np
import scipy.stats as stats

def ComputeChiSquareGOF(expected, observed):
    """
    Runs a chi-square goodness-of-fit test and returns the p-value.
    Inputs:
    - expected: numpy array of expected values.
    - observed: numpy array of observed values.
    Returns: p-value
    """
    expected_scaled = expected / float(sum(expected)) * sum(observed)
    result = stats.chisquare(f_obs=observed, f_exp=expected_scaled)
    return result[1]

def MakeDecision(p_value):
    """ 
    Makes a goodness-of-fit decision on an input p-value.
    Input: p_value: the p-value from a goodness-of-fit test.
    Returns: "different" if the p-value is below 0.05, "same" otherwise
    """  
    return "different" if p_value < 0.05 else "same"

if __name__ == "__main__":
    expected = np.array([122319, 295701, 101195])
    observed1 = np.array([1734, 3925, 1823])
    observed2 = np.array([122, 295, 101])

    p_value = ComputeChiSquareGOF(expected, observed1)
    print "Comparing distributions %s vs %s = %s" % \
        (expected, observed1, MakeDecision(p_value))

    p_value = ComputeChiSquareGOF(expected, observed2)
    print "Comparing distributions %s vs %s = %s" % \
        (expected, observed2, MakeDecision(p_value))

The output from running this test is:

Comparing distributions [122319 295701 101195] vs [1734 3925 1823] = different
Comparing distributions [122319 295701 101195] vs [122 295 101] = same

Related Solutions

Solved – Comparing two discrete distributions (with small cell counts)

There are two technical issues to deal with: (1) measuring the discrepancy between observed and expected and (2) computing the p-value.

We can retain the chi-squared measure of discrepancy (thereby finessing issue 1) and compute an exact p-value. The simple way is to simulate sampling from the expected distribution. Here is the distribution of 10,000 samples performed in R:

Histogram

The actual chi-squared statistic for these data is $549/38 \approx 14.447$. Apparently it is far out in the upper tail of this histogram: only $25$ of the $10,000$ results (0.25%) equal or exceed it. Yes, this proportion is almost four times greater than the approximation of $0.0007$ reported by the chi-squared test, but it's still tiny. We conclude that the observed distribution is significantly different from the expected distribution.

The "domain knowledge" may indeed correctly suggest the amount of difference is not material. That, however, is independent of the finding that the observed frequencies are unlikely to arise randomly from a distribution with the expected frequencies. That is all that statistical significance means.

Solved – Comparing (and testing) two discrete distributions with different magnitudes

You could present relative frequencies of people found in the social network, i.e. "value over pop"

 type Percent
    1  0.0473
    2  0.0246
    3  0.0113
    4  0.0118

and just compare these percentages.

enter image description here

As the numbers and also the barplot show, the relative frequencies of people found in the districts vary quite a bit, i.e. not all districts are equally represented in the social network.

I doubt whether it is useful to use methods from inductive statistics here because your data set does not seem to be a random sample from a population. Should my impression be wrong, then you could either think of adding binomial confidence intervals to each of those percentages and/or run a chi-squared goodness-of-fit test using the population distribution as the reference.

In R:

N <- c(2248360, 3544721, 70934, 2090647)
n <- c(1064, 873, 8, 246)
chisq.test(n, p = N/sum(N))

# Output
        Chi-squared test for given probabilities

data:  n
X-squared = 526.0491, df = 3, p-value < 2.2e-16

At the 5% level, you could reject the null hypothesis that all districts are equally represented in the social network.

Best Answer

Related Solutions

Solved – Comparing two discrete distributions (with small cell counts)

Solved – Comparing (and testing) two discrete distributions with different magnitudes

Related Question