MATLAB: How to define categorical within factors in fitrm

categoricalfitrmMATLABrepeated measuresStatistics and Machine Learning Toolboxwithin design

I've observed unexpected behaviour of categorical factors in fitrm. Two ways to produce the same table of within factors produce different fitrm output. Why is this?
I've got a completely within-subjects design with 8 observations on 20 subjects. I first specified the two within factors by using table2array on the design matrix, with categorical indices for the factor levels:
within_fact = categorical(fullfact(nr_cond, nr_sessions]));
within_tbl1 = array2table(within_fact,'VariableNames',{'Condition','Session'});
The second way was to make the two factors only categorical after transforming them into a table:
within_fact = fullfact([nr_cond,nr_sessions]);
within_tbl2 = array2table(within_fact,'VariableNames',{'Condition','Session'});
within_tbl2.Condition = categorical(within_tbl2.Condition);
within_tbl2.Session = categorical(within_tbl2.Session);
According to the isequal function, the two factor tables are identical:
isequal(within_tbl1, within_tbl2)
ans = 1
However, they produce different outcomes when used as within design in a rm anova:
% M = 20x8 double matrix
data = array2table(M, 'VariableNames', {'S1C1', 'S1C2', 'S2C1', 'S2C2',...
'S3C1', 'S3C2', 'S4C1', 'S4C2'});
rm1 = fitrm(data,'S1C1-S4C2 ~ 1','WithinDesign',within_tbl1);
ranovatbl1 = ranova(rm1, 'WithinModel', 'Condition*Session');
rm2 = fitrm(data,'S1C1-S4C2 ~ 1','WithinDesign',within_tbl2);
ranovatbl2 = ranova(rm2, 'WithinModel', 'Condition*Session');
Unexpectedly, rm1 produces a anova table with df=3 for both Condition (which only has 2 levels) and Session (which has 4). rm2 has the correct df's (1 for Condition, 3 for Sessions). I'm confused as to why they produce different outcomes.
(MatLab 2014b on Win 7)

Best Answer

When you create within_fact, you are defining a matrix with categories from 1 up to max(nr_cond, nr_sessions). So both columns of within_tbl1 are defined to have that many categories. It may be, say, that the first column has the notion of category 5 but doesn't actually have any data on that category.
When you create the variables in within_tbl2, each categorical variable is defined separately so it only has as many categories as actually appear in that column. This is what you almost certainly want.
Perhaps the way fitrm deals with this condition could be improved. I'll look into it.