MATLAB: Reference dumthe coding with Matlab fitlme

fitlmeMATLABregression

Thanks in advance for the help
I have a set of data that is composed of multiple categorical predictors and a single numerical response. I want to use regression to predict the response. Matlab automatically recognizes categorical data and uses dummy coding to remove the rank and magnitude that is associated with numeric predictors.
Here is my question. Suppose I have a predictor with categories 0, 1, and 2. I want to be able to specify which of the three categories is the reference (I want to use reference dummy coding as opposed to effects or full). Is there a way to do this? In particular I am using fitlme. The docs say that the first category is set to zero when using reference dummy coding. In my case does this mean that 0 would be the reference variable or is the first category that matlab sees in my dataset set as the reference variable? In other words, what does 'the first category' mean?

Best Answer

Hi Ryan,
You can use categorical or nominal to specify the first category. Here's an example:
% 0. Dummy data.
rng(0,'twister');
y = rand(30,1);
g = [zeros(10,1);ones(10,1);2*ones(10,1)];
T = table(y);
% 1. First category is automatically set based on sort order. In this
% case it will be 0.
T.g = categorical(g);
lme = fitlme(T,'y ~ g')
% 2. Make 2 the first category.
T.g = categorical(g,[2,0,1]);
lme = fitlme(T,'y ~ g')
% 3. Same as 1 but using nominal.
T.g = nominal(g);
lme = fitlme(T,'y ~ g')
% 4. Same as 2 but using nominal.
gn = nominal(g);
getlabels(gn)
gn = reorderlevels(gn,{'2','0','1'});
T.g = gn;
lme = fitlme(T,'y ~ g')
Hope this helps,
Gautam