MATLAB: Comparing lists of strings

cell arraysMATLABpermutationstrcmpstrings

OK so I have two lists of, say, names, List_one and List_two.
- List_one is smaller than List_two: they are both cell arrays.
I want to find the indices where List_two contains elements from List_one. So far, I've been using strcmpi, but because the lists are different sizes (and in different orders) I need to do it element by element, which I can't believe is the most efficient solution. Any tips?
Just now I'm doing something like:
for i = 1:length(List_one) tf = strcmpi(List_one(i),List_two); ind = ind + tf; % so that the end result List_two(ind) = List_one. end
and I just can't imagine that's the best way, though I've spent a long time reading about ways to compare lists, I still haven't found anything satisfactory. One more, related, question:
If I have a list of names, and someone permutes it (they don't give me the indices for the new ordering, only the new list) – what is the best way to uncover the indices?
As in, List_two(indices) = List_two_permuted; – how do I find the permutation?

Best Answer

Both tasks can be achieved by the fast C-Mex FEX: CStrAinBP:
List1 = {'A', 'b', 'cd', 'eFG', 'Miss'};
List2 = {'b', 'A', 'eFG', 'cd', 'Q', 'A'};
[Ex, Seq] = CStrAinBP(List1, List2);
% >> Ex = 1 2 3 4
% >> Seq = 2 1 4 3
% Now: isequal(List1(Ex), List2(Seq))
Repeated strings are considered. The 3rd input "i" triggers a case-insensitive comparison.
Speed (measured with R2009a, 184 folder names, Core2Duo 2.3 GHz):
List1 = regexp(path, pathsep, 'split');
List2 = List1(randperm(length(List1)));
tic; for k=1:1000, [Tf, Loc] = ismember(List1, List2); end; toc
% >> 1.747856 sec
tic; for k=1:1000, [Ex, Seq] = CStrAinBP(List1, List2); end; toc
% >> 0.071419 sec
For large strings lists, e.g. 100'000 strings, the sorting of ismember has benefits, because the binary search is cheaper than the linear search of CStrAinBP.