Dataset – How to Organize Data for Repeated-Measure Within-Subject Setup?

datasetgeneralized linear modelrepeated measures

This question is a follow up to the question I asked yesterday.

Here is the current situation again. I have data for a within subject measurement, where each variable is measured repeatedly. The repetitions are used to cancel out random fluctuation of the measured variable. Each subject was exposed to each of the conditions, so I have multiple datapoints per condition and subject.

Currently the data is organized like this:

Subject; Cond1_rep1; Cond1_rep2; ... Cond1_rep20; Cond1_mean; Cond2_rep1; Cond2_rep2; ... Cond2_rep20; Cond2_mean ... Cond8_rep1; Cond8_rep2 ... Con8_rep20; Cond8_mean
1      ; ....
2      ; ....

So I have eight Conditions, and around 20 repetitions per condition per subject. Currently there are about 20 subjects to be included in the analysis (no removal of outliers and other data cleanup is done yet).

Now for analysis the first attempt would be to do a multivariate ANOVA on all the means for all different conditions to see if there is any difference in the distributions. However if I use means like this I will use the fact, that each value probably has a very low error and the likelihood of a false result is actualy lower than estimated by the ANOVA. So I should somehow include the repetitions in the analysis.

The suggestion I was given in the previous question was to use a generalized linear model with repetitions and include all the repetitions in the analysis. However in this case I would need an additional variable identifying the conditions to be used as an independent variable in the generalized linear model. So I would have to somehow add this in my data.

What I would try would be first to split up the eight conditions. The eight conditions come from variations of three independent factors, so I can easily split them up. Then I would separate my data into multiple rows, so I can introduce a condition variable. This way I would organize like this:

Subject; Factor1; Factor2; Factor3; Rep1; Rep2; ... Rep20
1      ; 0      ; 0      ; 0      ; ...
1      ; 1      ; 0      ; 0      ; ...
1      ; 0      ; 1      ; 0      ; ...
1      ; 1      ; 1      ; 0      ; ...
...
1      ; 1      ; 1      ; 1      ; ...
2      ; 0      ; 0      ; 0      ; ...
2      ; 1      ; 0      ; 0      ; ...

This way I would get eight rows per subject with a different combination of factors. In this case I can do a repeated measures GLM with the factors as independent variables and the 20 repeated measurements as repeated dependent variables.

Would this setup be correct or are there any drawbacks? If there are, is there any better way of analysing this data without loosing the fact that each measurement was repeated multiple time (as when you take the mean first).

EDIT:

I am not so much interested in a way to restructure the data like this, but my main question if there could be a problem when I do a GLM like this, without taking into account that some data points where gathered from the same person. The final analysis in SPSS should not be a problem.

Best Answer

Most analyses in SPSS expect data in the “wide” format (i.e. one row for each participant). The one exception I know of is the MIXED procedure. If I understand you correctly, using MIXED could be a solution but reorganizing your data and running GLM is going to be inappropriate.

For more details on MIXED and data preparation see this document from SPSS or this step-by-step guide from Marc Brysbaert.

If you want to use the GLM procedure (through code or through the dialog box, as described in the answer you accepted to your earlier question), then the data must remain in wide format. If you want to use a multi-level model (John's answer to your last question), then MIXED is the way to go and you will need to reorganize your data. But you cannot mix both approaches and run GLM on data in “long” format.