Solved – What statistical test is appropriate for paired data where the same subjects are tested at multiple times

experiment-designpaired-comparisonspaired-datat-test

In the test in question, we have data [measurements of discomfort] for the same subjects with different virtual reality experiences. We took multiple measurements of discomfort over the course of the experience. We want to measure the significance between difference in discomfort between the two experiences.

In effect, we have a 3D table of data where the dimensions are the different experiences, the subjects that volunteered, and the measurements over time.

Measurements from the same subject at the same time but different experiences are paired and would be appropriate for a paired t-test, but measurements from the same subject at different times are highly correlated and so [if I understand the paired t-test properly] this would be inappropriate to analyze with.

Is there some other version of a t-test that returns a significance value but can deal with data like this? Is the best option just paired t-tests per time slice and "combining" significance values by some statistical method?

Best Answer

If I understand your setup, this experiment is as follows:

2 independent variables (VR scenario experienced, time)
- VR scenario, two levels (A and B)
- Time (start, 5 minutes in, 10 minutes in, 15 minutes in, end, etc)
1 dependent variable (measure of discomfort) with multiple measurements taken per subject over time (repeated measures).

The first important question is the nature of your measure of discomfort. I assume you'll use something like a Likert-scale (on a scale from 1 to 5, with how much discomfort do you feel, with 1 being mild/no discomfort and 5 being extreme discomfort), so the measure will be parametric and on an interval/ratio scale. This is the most common (and usually the most useful) method, so we'll pretend that's what you had in mind.

You'll also of course have two possible ways to setup the human participants: 1) have every participant experience both conditions (you'll probably want to use counter-balanced ordering of experimental conditions), or 2) participants only experience one VR condition (either Scene A or Scene B, not both). You can do either one, but generally you'll need less participants if you go with option 1, as there will be less variance due to between-person factors (your experience of "carrots vs celery" will be more similar than "your experience of carrots" vs "my experience of celery", after all). You can use option 2 if necessary and this won't really change the test, but is generally avoided unless you have a good reason (like learning effects are just too great, etc).

If I've described your experimental scenario accurately, the most common test used for this is a two-way repeated measures ANOVA. This will allow you determine first if there is any statistically significant difference in any of the conditions (taking care of the issues you'd have by running repeated t-tests), and then the post-hoc tests will allow you to identify just what conditions are different from each other. If you decided to test only one VR scenario, then you'd use a one-way repeated measures ANOVA instead.

You might also reasonably ask a question like, "we also wonder how excited the participants feel", in which case you'd have participants respond to both a measure of discomfort and a measure of excitement. In this case you'd be adding a dependent variable, excitement, and this would change the test you'd need. In such a case what you'd likely want is a two-way repeated measures MANOVA if you kept both VR conditions, or if you dropped to one you'd just want a (one-way) repeated measures MANOVA. The more questions you ask (dependent variables) the more power you lose, so make sure you actually care about all the measures and don't just add them in willy-nilly.

Someone might be tempted to include one more independent variable, but generally I strongly warn you to avoid that temptation unless you have a lot of experience with such a beast, as heaven forbid you end up with a 3-way interaction of variables and need to interpret what is going on in a sensible fashion. It can get really messy and end up muddying the waters rather than clarifying them.

You'll naturally want to make note of all the assumptions of your chosen test, and SPSS will help you test the assumptions as well. These sorts of tests are very common in areas like HCI and cognitive psychology, and are not at all exotic. There are surely other approaches that could be used, but these are the classic approaches which are popularly published in these fields.

Related Solutions

Solved – When does an unpaired test result in higher p-value than a paired test

As @whuber says in the comment above, when the measures are negatively correlated, the p-value can be lower in the unpaired test than in the paired test. Here's an example where there is a difference:
```
library(MASS)
s  <- matrix(c(1, -0.8, -0.8, 1), 2)
df <- mvrnorm(n=100, mu=c(0, 0.3), Sigma=s, empirical=TRUE)
t.test(df[,1], df[, 2], paired=FALSE)
t.test(df[,1], df[, 2], paired=TRUE)
```
The first test (unpaired) gives p=0.035, the second gives p=0.117.
Yes, this is a design issue. This book chapter discusses it: Keren, G. (2014). Between-or within-subjects design: A methodological dilemma. A Handbook for Data Analysis in the Behaviorial Sciences: Volume 1: Methodological Issues Volume 2: Statistical Issues, 257, which you can read some of on Google books.

Hmmm... I'm not sure. I'd do a simulation to find out the effect on the type I error rate. How this affects your power is a separate issue that I haven't looked into here. Slight adaptation of my previous code:

paired   <- rep(NA, 1000)
unpaired <- rep(NA, 1000)
for(i in 1:1000){
      df          <- mvrnorm(n=100, mu=c(0, 0), Sigma=s, empirical=FALSE)
      unpaired[i] <- t.test(df[,1], df[, 2], paired=FALSE)$p.value
      paired[i]   <- t.test(df[,1], df[, 2], paired=TRUE )$p.value
}

sum(paired < 0.05)
sum(unpaired < 0.05)

Result:

> sum(paired < 0.05)
[1] 46
> sum(unpaired < 0.05)
[1] 137

Well look at that. If you treat them as unpaired, your type I error rate rockets. You need to treat them as paired to get the right answer. I believe (it's a long time since I've read it) that this is one of the issues Keren talks about in that chapter. If you're going to have data that might be negative correlated (e.g. amount of soup and amount of burgers someone eats) you'll have more power with an unpaired design.

Solved – One way anova or Paired-t test for same samples, different measurement technique

As the population of samples that are being collected are from the same object, you need to consider a paired t-test with dependent samples here. First and foremost, you need to check if normality assumptions hold true. One possible way to do that is to generate Q-Q plots and see how the data is distributed.

If the normality assumption does not hold true, then consider transforming the data (either $\log$ or $\exp$). Check again for normality using Q-Q plots. If normality fails, then look at non-parametric approaches such as Wilcoxon tests. They do not assume things about the distribution of the data.

A simple way to study these methods is by using G*Power 3.1 toolbox. They have a very simple user-interface that will allow you to select the type of test and generate a $p$-value.

Best Answer

Related Solutions

Solved – When does an unpaired test result in higher p-value than a paired test

Solved – One way anova or Paired-t test for same samples, different measurement technique

Related Question