ansaurus

Question

Strcmp for cell arrays of unequal length in MATLAB

Answer 1

A:

I got the following solution working, but I'm still wondering if there's a better way to do this:

function [output]=cellstrcmpi(largecell,smallcell)
output=zeros(size(largecell));
idx=1;
while idx<=length(largecell)-length(smallcell)+1
    if sum(strcmpi(largecell(idx:idx+length(smallcell)-1),smallcell))==length(smallcell)
       output(idx:idx+length(smallcell)-1)=1;
       idx=idx+length(smallcell);       
    else
        idx=idx+1;
    end
end

(I know, I know, no error checking - I'm a horrible person.)

Doresoom 2010-06-30 19:31:13

Answer 2

+5 A:

You could actually use the function ISMEMBER to get an index vector for where the cells in largecellarray occur in the smaller array smallcellarray, then use the function STRFIND (which works for both strings and numeric arrays) to find the starting indices of the smaller array within the larger:

>> nSmall = numel(smallcellarray);
>> [junk,matchIndex] = ismember(largecellarray,...  %# Find the index of the 
                                smallcellarray);    %#   smallcellarray entry
                                                    %#   that each entry of
                                                    %#   largecellarray matches
>> startIndices = strfind(matchIndex,1:nSmall)  %# Starting indices where the
                                                %#   vector [1 2 3] occurs in
startIndices =                                  %#   matchIndex

     1     6

Then it's a matter of building the vector index from these starting indices. Here's one way you could create this vector:

>> nLarge = numel(largecellarray);
>> endIndices = startIndices+nSmall;  %# Get the indices immediately after
                                      %#   where the vector [1 2 3] ends
>> index = zeros(1,nLarge);           %# Initialize index to zero
>> index(startIndices) = 1;           %# Mark the start index with a 1
>> index(endIndices) = -1;            %# Mark one index after the end with a -1
>> index = cumsum(index(1:nLarge))    %# Take the cumulative sum, removing any
                                      %#   extra entry in index that may occur
index =

     1     1     1     0     0     1     1     1

Another way to create it using the function BSXFUN is given by Amro. Yet another way to create it is:

index = cumsum([startIndices; ones(nSmall-1,numel(startIndices))]);
index = ismember(1:numel(largecellarray),index);

gnovice 2010-06-30 19:56:43

Won't this fail to produce the correct result if `largecellarray` is `{'string3'}` ?

Jonas 2010-06-30 20:02:58

@Jonas: I get `index = 0` for that case, using the newest version of my solution above.

gnovice 2010-06-30 20:07:02

@gnovice: Oh, now I understand your solution. Clever! +1

Jonas 2010-06-30 20:23:17

Answer 3

+1 A:

In @gnovice answer the first part can be

l = grp2idx(largecellarray)';
s = grp2idx(smallcellarray)';
startIndices = strfind(l,s);

yuk 2010-06-30 20:02:29

@yuk: I didn't know grp2idx. Nice! But wouldn't this fail if there was a `string0` in largecellarray?

Jonas 2010-06-30 20:25:35

Unfortunately, this only works if the N entries in `smallcellarray` are *exactly* the same as the first N entries in `largecellarray`.

gnovice 2010-06-30 20:28:00

Yes, it will actually fail in many cases, since for grp2idx the order is important. Probably the ismember function is important here.

yuk 2010-06-30 20:31:38

if you pass them to grp2idx combined as one cellarray, it will solve the problem..

Amro 2010-07-01 04:16:22

Answer 4

+5 A:

Here's my version (based on the answers of both @yuk and @gnovice):

g = grp2idx([S L])';
idx = strfind(g(numel(S)+1:end),g(1:numel(S)));
idx = bsxfun(@plus,idx',0:numel(S)-1);

index = zeros(size(L));
index(idx(:)) = 1;

Amro 2010-07-01 04:14:24

Elegant solution! +1

Jonas 2010-07-01 12:15:27

+1: Very nice, although 2 things bear mentioning: 1) You need the [Statistics Toolbox](http://www.mathworks.de/access/helpdesk/help/toolbox/stats/) to use [GRP2IDX](http://www.mathworks.de/access/helpdesk/help/toolbox/stats/grp2idx.html). 2) The function [FINDSTR](http://www.mathworks.com/access/helpdesk/help/techdoc/ref/findstr.html) appears to be slated for obsolescence in favor of [STRFIND](http://www.mathworks.com/access/helpdesk/help/techdoc/ref/strfind.html).

gnovice 2010-07-01 14:38:32

@gnovice: Fixed findstr/strfind (note the order of arguments is important now), I didnt realize it was a deprecated function.. thanks

Amro 2010-07-01 17:40:54

ansaurus

tags:

views:

answers:

Strcmp for cell arrays of unequal length in MATLAB

related questions