Solved – When to use t-test for dependent vs. independent sample

samplet-test

I am writing you with a statistics question for which I cannot find an answer on the internet.

I have conducted a survey to find out how much consumers like different products. "Likability" is a 3-item construct in my survey and I test it for 15 different products.

Now I want to know if there is a significant difference in likability between the products. E.g., do consumers like product 1 significantly more than product 2?

To this end, I have calculated a "Likability-Score" for each product which is the mean of all 3 items and all respondents. As far as I understand I now have to run a t-test. But do I have to run a t-test for independent or dependent samples?

From what I read so far: A dependent sample t-test could make sense because respondent may compare likability of different products implicitly. An independent t-test could make sense because there is no before-after assessment with a treatment in between.

Best Answer

My intent with this answer is to show the concepts that indicate that the data is suitable for a paired t test. There are so many issues that could be discussed about particular data set. This is shown somewhat in the comments above. I address the title question "When to use t test for dependent vs independent sample?" My hope is that my statements and the example will ajavascript:void(0)ddress this issue that the OP could not find on the internet.

First of all to run a two sample t test you need to have incorporated by design a clear correlation between the two variables whose means you want to compare. I will give an example later. If the data is discrete than you want it to be close to continuous and approximately normal. Since the data are paired you want equal sample size. Now one could reason that "if I run an unpaired t test I don't need equal sample size and I have more degrees on freedom than if I pair" (in the equal sample size case the degrees of freedom for the unpaired t test is 2n-2 compared to n-1 for the paired test).

Although at first blush this seems to automatically favor the independent test to the paired test since more degrees of freedom means a lower variance for the t statistic. But this is not always so. If the variability between paired differences is much less than the variability within pairs. This is a little vague and you may be puzzled by my terminology. Here is an example that I hope will make things clear.

Suppose I have average daily temperature data at two locations say Washington DC and New York City. Temperature at a given location is very seasonal and also varies geographically. On a given day New York and Washington DC are geographically close enough to each other to share weather patterns so perhaps the variation in temperature between these two cities on the same day is not as great as the variability when comparing them in different seasons (particularly a winter month vs a summer month). It should be clear that there is a way to pair correlated data from New York and Washington DC.

In a textbook example I used the following sampling to develop a paired t test. I pair data taken on the same day (15th of the month) for Washington and New York. I collect this data over a given year giving me a sample size of 12 pair.

Here is the data:

Day..............Wash..............NY

....................Avg. Temp.(F)..Avg. Temp.(F)

January...15........31..................28

February..15........35..................33

March......15........40..................37

April.........15........52..................45

May..........15........70..................68

June.........15........76.................74

July...........15........93.................89

August......15........90.................85

September.15........74................69

October......15........55................51

November...15........32................27

December...15........26................24

From this data we get the following set of paired differences 3, 2, 3, 7, 2, 2, 4, 5, 5, 4, 5 and 2

In the book we applied the unpaired test using the pooled estimate of variance and got a test statistic t = 0.378 checking the table with with 22 degrees of freedom we find that this is far from significant. For alpha =0.10 the critical value is 1.7171 indicating that the p-value is higher than 0.10.

For the paired test we get t=7.86 and referring to a t distribution with 11 degrees of freedom we find that the critical value for alpha = 0.01 is 3.1058. So the p-value is less than 0.01. In the unpaired case we cannot reject the null hypothesis that the average temperature at New York is different from Washington. In the paired case we clearly can.

You can get more details by referring to my text Introductory Biostatistics for the Health Sciences: Modern Applications Including Bootstrap pp. 195-199. Michael R. Chernick and Robert H. Friis Wiley (2003).

Related Question