MATLAB: Retain dumthe variable labels from converting categorical to dumthevar

categorical to dummyvardummyvarmachinelearning with categorical variablesml with categorical variablesStatistics and Machine Learning Toolbox

Hi there,
I have 19 categorical columns which I have converted into being a number for each category. However, I want to increase the number of columns so that I have a dummy for each category. What I find is that I have no idea where the dummy variables have gone, which I need to make an interpretable solution e.g. if a user is from Thailand or not, that variable is significant in a logistic regression.
Here is my code:
%categoricalnbs is the number converted version for all the categorical
%variables. Some columns in that table have categories 1-200, some just
%have categories 1 to 20.
categoricalnbsarray = table2array(categoricalnbs);
% categoricalnbsarray = table2array(finalnbs(:,[9:26,28]));
%finalnbs keeps the actual category names, which I thought could help with
%generating the column labels for the dummyvars, but using that line
%doesn't help.
[~, ~, ugroupA] = unique(categoricalnbsarray(:,2));
dummyvars=dummyvar(ugroupA);
array2table(dummyvars);
This increases the columns in categoricalnbs from 19 to 200, and retains the same number of rows. But how do I interpret the output…

Best Answer

I wrote a function that does this, here you go:
function Tdummy = dummytable(T)
% Tdummy = dummytable(T) - convert categorical variables in table to dummy
% variables
%









% This function takes the categorical variables in a table and converts
% them to separate dummy variables with intelligent names. This way they
% can be used in the Classification Learner App and the variable names make
% sense for feature selection, etc.
%
% Usage:
%
% Tdummy = dummytable(T)
%
% Inputs:
%
% T: Table with categoricals or categorical variable
%
% Outputs:
%
% Tdummy: T with categorical variables turned into dummy variables with
% intelligent names
%
% Example:
%
% % Simple Table
% T = table(rand(10,1),categorical(cellstr('rbbgbgbbgr'.')),...
% 'VariableNames',{'Percent','Color'});
% disp(T)
%
% % Turn it into a dummy table
% Tdummy = dummytable(T);
% disp(Tdummy)
%
% See Also: dummyvar, table, categorical, classificationLearner
% Copyright 2015 The MathWorks, Inc.
% Sean de Wolski Apr 13, 2014
% Error checking
narginchk(1,1)
validateattributes(T,{'categorical', 'table'},{},mfilename,'T',1);
% If it's a categorical, do out best to convert it to a table with an
% intelligent variable name
if iscategorical(T)
% Try to use existing variable name
cname = inputname(1);
if isempty(cname)
% It's a MATLAB Expression, default to Var1
cname = 'Var1';
end
T = table(T,'VariableNames',{cname});
end
% Identify categoricals and their names
cats = varfun(@iscategorical,T,'OutputFormat','uniform');
% Short circuit if there are no categoricals
if ~any(cats)
Tdummy = T;
return
end
% Store everything in a cell. w will be the total width of the table
% with each variable dummyvar'd
w = nnz(~cats)+sum(varfun(@(x)numel(categories(x)),T(:,cats),'OutputFormat','uniform'));
% Preallocate storage
datastorage = cell(1,w);
namestorage = cell(1,w);
% Engine
idx = 0; % Start nowhere in cell
for ii = 1:width(T)
idx = idx+1;
% Loop over table deciding what to do with each variable
if cats(ii)
% It's a categorical,
% Extract it and build keep its categories and dummyvar
Tii = T{:,ii};
categoriesii = categories(Tii)';
ncatii = numel(categoriesii); % How many?
% Build dummy var as a row cell with columns in each
dvii = num2cell(dummyvar(Tii), 1); % Dummy var then cell
% Build names
namesii = strcat(T.Properties.VariableNames{ii}, '_', categoriesii);
% Insert
datastorage(idx:(idx+ncatii-1)) = dvii;
namestorage(idx:(idx+ncatii-1)) = namesii;
% Increment
idx = idx+ncatii-1;
else
% Extract non categorical into current storage location
datastorage{idx} = T{:,ii};
namestorage(idx) = T.Properties.VariableNames(ii);
end
end
% Build Tdummy with comma separated list expansion
Tdummy = table(datastorage{:},'VariableNames',matlab.lang.makeValidName(namestorage));
end
Related Question