MATLAB: Ismember for table rows gives error for NaN and string

ismembernan

I am trying to confirm whether a row from one table can be found in another row of another table.
The row I want to find may have more or less columns than the table to search in. I build a string array of compatible indexes and subset both row and table to handle this. I was unable to find a better way to deal with this issue.
One condition I need is that if the row has more columns, I want the table to be amended by "NaN" or otherwise empty columns, such that ismember always shows 0. Since I do not know the type of the table column that is missing, I can not do things like "zero". I thought NaN – as missing data, literally "not a number" would make sense. But, as it turns out, it doesnt' work either.
The issue is that the behavior of ismember depends on whether the table field is a numeric or string. If it is a numeric, it works with NaN. If it is a string, it fails with error
"Error using tabular/ismember (line 37)
Unable to merge the 'c' variables in A and B.
Caused by:
Error using union (line 110)
Second argument must be a string array, character vector, or cell array of character vectors."
If ismember is implemented on table, I think it should work whether the table has string or numbers. In the end, that's the use case for table, is it not? Otherwise, I wonder what sort of default element I could use to set up missing data in a table, given that NaN is not implemented for strings.
Is there a more generic NaN?
Here is a minimum example .
Note how this works correctly:
% This builds a row table, and a 2-row table with less columns

clear row row2 secondTable
row.a=1;
row.b="test";
row.c=3;
row=struct2table(row);
row2.a=1;
row2.b="test";
%row2.c=3 % - condition for ismember=1
secondTable=struct2table(row2);
secondTable(2,:)=cell2table({2,"hello"});
% Since c does not exist, replace it with NaN

secondTable.c=NaN(height(secondTable),1);
% This will not find a match
[exist,idx]=ismember(row,secondTable,'rows')
However, this throws an error because now c is a string field
% This builds a row table, and a 2-row table with less columns
clear row row2 secondTable
row.a=1;
row.b="test";
row.c="A string";
row=struct2table(row);
row2.a=1;
row2.b="test";
%row2.c="A string"; % - condition for ismember=1
secondTable=struct2table(row2);
secondTable(2,:)=cell2table({2,"hello"});
% Since c does not exist, replace it with NaN
secondTable.c=NaN(height(secondTable),1);
% Error
[exist,idx]=ismember(row,secondTable,'rows')

Best Answer

"One condition I need is that if the row has more columns, I want the table to be amended by "NaN" or otherwise empty columns, such that ismember always shows 0"
In that case, it is simpler to compare the number of columns and not bother calling ismember at all. In fact, ismember requires both tables to have exactly the same variable names, so you could just compare both set of variable names:
if ~isempty(setxor(table1.Properties.VariableNames, table2.Properties.VariableNames))
%variable names don't match
result = zeros(height(table1), 1);
else
result = ismember(table1, table2);
end
Or you could just call ismember and trap the error that will be raised when the number of variables or their names don't match
try
result = ismember(table1, table2);
catch
%ismember failed
result = zeros(height(table1), 1);
end
" I wonder what sort of default element I could use to set up missing data in a table, given that NaN is not implemented for strings"
NaN is a numeric value. It can be used to indicate missing numbers but it does not make any sense for strings. If a variable is a string type, filling it with NaN is a very bad idea as the variable is then a mix of strings and numbers.
Matlab has a specific missing indicator for strings (<missing>), as long as you do mean string and not char array. For char array, the only thing you can use is an empty char array '' (which is indistiguishable from an empty char array unfortunately).
For the types that support it you can use the missing function to generate an array of missing values. This gets converted to the proper missing indicator (NaN, NaT, <undefined>, <missing>) depending on the type.