Statistical Significance – How to Determine Statistical Significance Between Two Data Points

excelstatistical significance

To preface, I have not done stats of any sort in about a decade, so I am a bit rusty.

I have a question to help determine statistical significance between two populations.

I am working with a dataset of qualifying insurance members for multiple measures. For example, for any members that qualify for a certain measure, they will be identified by either a 0, meaning they are considered a part of the population or the denominator, and a 1, meaning they qualify for the measure or the numerator. With each measure, you can determine a rate by dividing Count(1)/(Count(1)+Count(0)).

In the dataset there is a column for Race/Ethnicity, and I was tasked to do analyze if there is any large discrepancies in our measures. After the analysis, we need to determine if the discrepancies we are seeing are actually statistically significant. So, for one of the measures, non-White members show a rate variance of about an average of -6%, and we want to determine if that is caused by something we lack on our side or is the non-White population too small.

For a data example:

Measure Ethnicity Numerator Denominator Rate Mean Rate
A White 2800 6750 41.5% 37.3%
A African American 880 2280 38.6% 37.3%
A Hispanic 230 720 31.9% 37.3%

Given these rates, we hope to determine if the population of non-White members is statistically smaller than that of the White population and thus a comparison between the two rates would not be beneficial.

I used Excel's built-in Data Analysis tool to perform a t-test, using the denominator of each measure for each ethnicity as an array. So, all White denominator members with the corresponding African American denominator members – which is a total of 18 different measures. I know t-tests require 2+ data points, but I do not know if this approach is the correct one.

Best Answer

I think what you are looking for is Pearson's $\chi^2$ test for association. (chi-squared test for association). I have provided the steps to get this information in Excel. (I ran the numbers; you are looking at significance at the 99% confidence interval.)

1. First, you need to total the rows and columns.

enter image description here

enter image description here

enter image description here

2. Then you can create the Expected table

This is the $\frac{observed * row.total}{total.of.all.rows}$

enter image description here

These are all calculated the same way. Here is a second example for clarity.

enter image description here

3. Is it significant?

You can walk through all of it manually in Excel, but at this point, you can get the $p$-value with a function.

enter image description here

4. Great. What does that mean?

I feel like more often than not, people stop before they get to the part that matters the most - what does it all mean?

Well, the significance indicates that there is an association between the rates and the ethnicities, as you've outlined them. (Really significance and confidence infer the reliability of the results if you were going to repeat the study...I digress)

You cannot tell from this result what associations are the strongest, though. That is where the observed (original values) and expected (calculated table) come in. The larger the deviation between the expected and the observed, the more influential the group is in the association.

The difference between the observed and expected for the White group is the largest at about 66.5, followed by the Hispanic group at about 42, and the least associative influence is the African American group at about 24.5.

This method of analysis, in and of itself, is not all that intuitive in the findings.

Related Question