MATLAB: Is it possible to join categorical variables in table according to group variables

efficiencyMATLABtable

I have a table (`A`) containing a string (`x`) of IDs and categorical (`y`) data types.
For example:
>> A.x
11×1 string array
"A-00555"
"A-01139"
"B-08811"
"B-00014"
"C-00007"
"C-00007"
"D-00015"
"D-00015"
"E-00048"
"E-00048"
"E-00048"
>> A.y
11×1 categorical array
APPLE
GRAPEFRUIT
COCONUT
APPLE
APPLE
BANANA
APPLE
COCONUT
APPLE
BANANA
KIWI
And I want to generate an array, of the same size as A.x, with a new categorical variable that "joins" all the A.y's of the same A.x(i). I may not be explaining this very well….
In the above example the resulting array would be something like this:
>> A.z
11×1 categorical array
APPLE
GRAPEFRUIT
COCONUT
APPLE
APPLE+BANANA
APPLE+BANANA
APPLE+COCONUT
APPLE+COCONUT
APPLE+BANANA+KIWI
APPLE+BANANA+KIWI
APPLE+BANANA+KIWI
Is there an efficient way to accomplish this? Is there a version of groupsummary—or something similiar—with a method option that is "concatenate categorical variable" according to groupvars?
Other info: The table contains a few million unique IDs. All rows of A are unique. There are 30 categorical variables.

Best Answer

I'd use findgroups. First let's define the data.
x = ["A-00555"; "A-01139"; "B-08811"; "B-00014"; "C-00007"; ...
"C-00007"; "D-00015"; "D-00015"; "E-00048"; "E-00048"; "E-00048"];
y = categorical(["APPLE"; "GRAPEFRUIT"; "COCONUT"; "APPLE"; "APPLE"; ...
"BANANA"; "APPLE"; "COCONUT"; "APPLE"; "BANANA"; "KIWI"]);
Now use findgroups to get the group numbers for each element in x.
g = findgroups(x);
join the elements of y (converted to string) in each group, putting a + between the elements.
s = splitapply(@(x) join(string(x), "+"), y, g);
Let's see the results as a table.
T= table(x, y, g, s(g))