Statistical Significance – How to Determine Statistical Significance Between Two Data Points

To preface, I have not done stats of any sort in about a decade, so I am a bit rusty.

I have a question to help determine statistical significance between two populations.

I am working with a dataset of qualifying insurance members for multiple measures. For example, for any members that qualify for a certain measure, they will be identified by either a 0, meaning they are considered a part of the population or the denominator, and a 1, meaning they qualify for the measure or the numerator. With each measure, you can determine a rate by dividing Count(1)/(Count(1)+Count(0)).

In the dataset there is a column for Race/Ethnicity, and I was tasked to do analyze if there is any large discrepancies in our measures. After the analysis, we need to determine if the discrepancies we are seeing are actually statistically significant. So, for one of the measures, non-White members show a rate variance of about an average of -6%, and we want to determine if that is caused by something we lack on our side or is the non-White population too small.

For a data example:

Measure	Ethnicity	Numerator	Denominator	Rate	Mean Rate
A	White	2800	6750	41.5%	37.3%
A	African American	880	2280	38.6%	37.3%
A	Hispanic	230	720	31.9%	37.3%

Given these rates, we hope to determine if the population of non-White members is statistically smaller than that of the White population and thus a comparison between the two rates would not be beneficial.

I used Excel's built-in Data Analysis tool to perform a t-test, using the denominator of each measure for each ethnicity as an array. So, all White denominator members with the corresponding African American denominator members – which is a total of 18 different measures. I know t-tests require 2+ data points, but I do not know if this approach is the correct one.

4. Great. What does that mean?

I feel like more often than not, people stop before they get to the part that matters the most - what does it all mean?

Well, the significance indicates that there is an association between the rates and the ethnicities, as you've outlined them. (Really significance and confidence infer the reliability of the results if you were going to repeat the study...I digress)

You cannot tell from this result what associations are the strongest, though. That is where the observed (original values) and expected (calculated table) come in. The larger the deviation between the expected and the observed, the more influential the group is in the association.

The difference between the observed and expected for the White group is the largest at about 66.5, followed by the Hispanic group at about 42, and the least associative influence is the African American group at about 24.5.

This method of analysis, in and of itself, is not all that intuitive in the findings.

Best Answer

I think what you are looking for is Pearson's $\chi^2$ test for association. (chi-squared test for association). I have provided the steps to get this information in Excel. (I ran the numbers; you are looking at significance at the 99% confidence interval.)

1. First, you need to total the rows and columns.

2. Then you can create the Expected table

This is the $\frac{observed * row.total}{total.of.all.rows}$

These are all calculated the same way. Here is a second example for clarity.

3. Is it significant?

You can walk through all of it manually in Excel, but at this point, you can get the $p$-value with a function.

Best Answer

1. First, you need to total the rows and columns.

2. Then you can create the Expected table

3. Is it significant?

4. Great. What does that mean?

Related Solutions

Solved – Statistical significance between two percentage change

Solved – Determining how much data is needed for statistical significance of a percentage difference between two points

Related Question