views:

1146

answers:

6

How can I get the number of times an NSString (for example, @"cake") appears in a larger NSString (for example, @"Cheesecake, apple cake, and cherry pie")?

I need to do this on a lot of strings, so whatever method I use would need to be relatively fast.

Thanks!

A: 

No built-in method. I'd suggest returning a c-string and using a common c-string style algorithm for substring counting... if you really need this to be fast.

If you want to stay in Objective C, this link might help. It describes the basic substring search for NSString. If you work with the ranges, adjust and count, then you'll have a "pure" Objective C solution... albeit, slow.

Pestilence
Doesn't calling e.g. NSString.UTF8String cause a new string to be allocated? It seems like it would be faster to use NSString's methods, such as rangeOfString.
Matthew Flaschen
Yes it does. Twice, if you decide to copy it for later use. Creating a c-string *once* and looking for k substrings is minimal in impact, compared to delving into NSString methods and allocating a substring after each hit.
Pestilence
Not necessarily. If you start with an immutable string, substrings won't require allocations. As well and as Chris demonstrates, there is also no need to extract the substrings at all. Note also that converting a string to UTF8 can be extremely expensive if the string is, say, UTF-16.
bbum
+3  A: 

There are a couple ways you could do it. You could iteratively call rangeOfString:options:range:, or you could do something like:

NSArray * portions = [aString componentsSeparatedByString:@"cake"];
NSUInteger cakeCount = [portions count] - 1;

EDIT I was thinking about this question again and I wrote a linear-time algorithm to do the searching (linear to the length of the haystack string):

NSUInteger numberOfOccurrencesOfStringInString(NSString * needle, NSString * haystack) {
    const char * rawNeedle = [needle UTF8String];
    NSUInteger needleLength = strlen(rawNeedle);

    const char * rawHaystack = [haystack UTF8String];
    NSUInteger haystackLength = strlen(rawHaystack);

    NSUInteger needleCount = 0;
    NSUInteger needleIndex = 0;
    for (NSUInteger index = 0; index < haystackLength; ++index) {
        const char thisCharacter = rawHaystack[index];
        if (thisCharacter != rawNeedle[needleIndex]) {
            needleIndex = 0; //they don't match; reset the needle index
        }

        //resetting the needle might be the beginning of another match
        if (thisCharacter == rawNeedle[needleIndex]) {
            needleIndex++; //char match
            if (needleIndex >= needleLength) {
                needleCount++; //we completed finding the needle
                needleIndex = 0;
            }
        }
    }

    return needleCount;
}
Dave DeLong
The componentsSeparatedByString solution causes quite a lot of unnecessary memory allocation.
Matthew Flaschen
@Matthew true, but it's a two-line solution.
Dave DeLong
i like it, excellent solution.
Biranchi
+3  A: 

This isn't tested, but should be a good start.

NSUInteger count = 0, length = [str length];
NSRange range = NSMakeRange(0, len); 
while(range.location != NSNotFound)
{
  range = [str rangeOfString: @"cake" options:0 range:searchRange);
  if(range.location != NSNotFound)
  {
    range = NSMakeRange(range.location + range.length, len - (range.location + range.length));
    count++; 
  }
}
Matthew Flaschen
+1  A: 

Here is a version done as an extension to NSString (same idea as Matthew Flaschen's answer):

@interface NSString (my_substr_search)
- (unsigned) countOccurencesOf: (NSString *)subString;
@end
@implementation NSString (my_substring_search)
- (unsigned) countOccurencesOf: (NSString *)subString {
    unsigned count = 0;
    unsigned myLength = [self length];
    NSRange uncheckedRange = NSMakeRange(0, myLength);
    for(;;) {
        NSRange foundAtRange = [self rangeOfString:subString
                                           options:0
                                             range:uncheckedRange];
        if (foundAtRange.location == NSNotFound) return count;
        unsigned newLocation = NSMaxRange(foundAtRange); 
        uncheckedRange = NSMakeRange(newLocation, myLength-newLocation);
        count++;
    }
}
@end
<somewhere> {
    NSString *haystack = @"Cheesecake, apple cake, and cherry pie";
    NSString *needle = @"cake";
    unsigned count = [haystack countOccurencesOf: needle];
    NSLog(@"found %u time%@", count, count == 1 ? @"" : @"s");
}
Chris Johnsen
Remember to remove the NSLog, and you have a fine solution.
Jon Reid
Heh, I deleted it once in the answer, but it came back when I re-pasted after testing another change. I have deleted it again, for the last time.
Chris Johnsen
+1  A: 

If you want to count words, not just substrings, then use CFStringTokenizer.

Peter Hosey
A: 
for(int i =0;i<htmlsource1.length-search.length;i++){
  range = NSMakeRange(i,search.length);
  checker = [htmlsource1 substringWithRange:range];

  if ([search isEqualToString:checker]) {
   count++;

  }

 }
muhomania