Solved – Paired t-test when each data point was repeatedly measured different number of times

hypothesis testingpaired-comparisonst-test

I have an existing data set that comes from the same group of people before and after they received a treatment.

The data set comes from when participants tested their blood sugar values over a 30 day period before receiving an insulin pump and a 30 day period after receiving an insulin pump. This data was obtained from user logs (archival data) and was not controlled to ensure that they were tested at fixed intervals. Participants tested themselves when they needed to test throughout the day for a period of 30 days.

My goal is to determine whether the average blood sugar across 30 days was different for the before vs after group.

Normally this would be a paired samples t-test but unfortunately the before and after group have an unequal number of data points with the after group having significantly more. People are testing more often after receiving an insulin pump.

What is the correct way to handle this?

I can collapse the data to find a mean for each participant before treatment and after treatment (across all participants) so that the data matches up and then run a paired samples t-test on this data but I think this is not the ideal solution.

Would a one way within subjects ANOVA be the appropriate test to run in this case?

Best Answer

For the purposes of hypothesis testing, I often find that the simpler approach is the best.

In this case, I would do exactly what you considered yourself: average all the pre-treatment values and all the post-treatment values for each participant, obtaining two values per participant. Then you can run a paired t-test on the resulting averages.

There is nothing wrong with this approach. If you do that and you get $p$-value sufficiently low for your purposes, you can call it a day. I guess there are much more complicated mixed models that one can set up here, but I would be skeptical that they produce much lower $p$-values (and if not, then there is no gain). Whereas two big advantages of the simple t-test on averages are: (1) it takes five minutes to perform; (2) it takes two lines to explain in a paper.


PS. If I am not mistaken, then a simple repeated measures ANOVA (that you asked about) cannot be applied in your case.

PPS. Note that without a control group you will not be able to say if the difference between post and pre (in case you observe any) is due to treatment or due to some time passing.


Update. What I wrote above was under assumption that either you have no information about times of individual measurements, or you are happy to assume that the time is irrelevant. @psarka argued (+1) that time of the day is very relevant and, worse, it is unlikely that measurement pre- and post-treatment were distributed along the day in the same way. So if you have the information about measurement times, then you should better take it into account, and the exercise becomes more complicated then. If not, then well, not.

In addition, @robin argued that the day number is important as well, see discussion in the comments.

Related Question