Solved – Statistical test to compare precision of two devices

repeated measuresstatistical significancevariance

I am comparing two temperature control devices both designed to maintain body temperature at exactly 37 degrees in anaesthetised patients. The devices were fitted to 500 patients forming two groups. Group A (400 patients) – Device 1, Group B (100 patients)- Device 2. Each patient had their temperature measured once every hour for 36 hours, giving me 18000 data points across two groups. I need to determine which device controls the patients' body temperature more precisely over the 36 hour period.
I have constructed line graphs joining the median values at each time point with quartile bars and visually there seems to be a difference.
How should I be analysing my data to prove a statistical difference?

Best Answer

The first thing you will need to think about is what it means (quantitatively) to have "good precision" in such a device. I would suggest that, in a medical context, the goal is to avoid temperature deviations that go into a dangerous range for the patient, so "good precision" is probably going to translate into avoiding dangerously low or high temperatures. This means you are going to be looking for a metric that heavily penalises large deviations from your optimal temperature of 37$^\text{o}$C. In view of this, measurement based on fluctuations in median temperatures is going to be a bad measure of precision, whereas measures that highlight large deviations will be better.

When you are formulating this kind of metric, you are implicitly adopting a "penalty function" that penalises temperatures that deviate from your desired temperature. One option would be to measure "precision" by lower variance around the desired temperature (treating this as the fixed mean for the variance calculation). The variance penalises by squared error, so that gives reasonable penalisation for high deviations. Another option would be to penalise more heavily (e.g., cubed-error). Another option would be to simply measure the amount of time each device has the patient outside the temperature range that is medically safe. In any case, whatever you choose should reflect the perceived dangers of deviation from the desired temperature.

Once you have determined what constitutes a metric of "good precision", you are going to be formulating some kind of "heteroscedasticity test", formulated in the wider sense of allowing whatever measure of precision you are using. I'm not sure I agree with whuber's comment of adjusting for autocorrelation. It really depends on your formulation of loss - after all, staying in a high temperature range for an extended period of time could be exactly the thing that is the most dangerous, so if you adjust back to account for auto-correlation, you might end up failing to penalise highly dangerous outcomes sufficiently.

Related Question