Solved – Questions about determining statistical significance in survey responses/methods tried

statistical significancet-distribution

Background: I am studying the customer satisfaction scores from a random sampling of customers at a technology firm. I have a collection of 13 data points which consist of averages of customer satisfactions scores per year over a course of 13 years. Below is a snap shot of the years, yearly satisfactions scores, and sample size. The scoring metric ranges from 1-5, where 1 = terrible service, 5=excellent service.

enter image description here

Problem: I would like to determine if the increase in the yearly satisfaction scores are statistically significant.

Methods Tried: I've done some googling and found the following link: "Detecting Significant Changes in Your Data", where it recommends using the T Distribution.

Whenever I follow the step-by-step approach given by the author I obtain the following output,

(where I average the csat scores from 2001-2003 as mu, and then average the scores from 2004-2013 as x-bar. Then I took the standard deviation of x-bar and got "s". "n" is the total number of data points since 2004, and student's t-value is calculated as t=|xbar – mu|/(s/squareroot(n)). Then p is calculated in excel as =TDIST(student t-value, degrees of freedom, tail), where those values are: 6.58, 9, and 1 respectively)).

Output

I obtain a value less than 5%, thus it appears that the change in csat scores is significant from 2004-2013.

I have several questions about this:

Questions:

  1. Is this the correct method to use to show statistical significance in my change of csat scores? If not then what is a correct way? I should note that I have only an elementary understanding of stats, so I am limited to comfortably doing calculations in excel.
  2. What is the difference between using 1-tail & 2-tails applied to my problem? Which one should I use & how do I determine that? If I use either 1-tail or 2-tails how do I determine if my % is statistically significant, would it still be the "less than 5%?" (assuming alpha=.05)
  3. What is the best way to formalize my null hypothesis given this problem?
  4. Why does the author (in the link) calculate p (t-distribution function in excel) using the value of "p" instead of "n"? Also, how is degrees of freedom/the tail(s) accounted for in the tdist function?

Best Answer

Since this is from a survey, you'd need to know the design of the survey and population size in order to property calculate the variance estimates. You may need to make a finite population correction depending on the population size and you should determine what type of "random sampling" was done. Was it stratified? A cluster random sample? All of these will impact your variance estimates and could ultimately change your conclusions since the variance estimates are used to calculate your statistical tests.

Related Question