views:

464

answers:

4

I have an NSString with a number of sentences, and I'd like to split it into an NSArray of sentences. Has anybody solved this problem before? I found enumerateSubstringsInRange:options:usingBlock: which is able to do it, but it looks like it isn't available on the iPhone (Snow Leopard only). I thought about splitting the string based on periods, but that doesn't seem very robust.

So far my best option seems to be to use RegexKitLite to regex it into an array of sentences. Solutions?

+4  A: 

Use CFStringTokenizer. You'll want to create the tokenizer with the kCFStringTokenizerUnitSentence option.

Peter Hosey
+1  A: 

I haven't used them for a while but I think you can do this with NSString, NSCharacterSet and NSScanner. You create a character set that holds end sentence punctuation and then call -[NSScanner scanUpToCharactersFromSet:intoString:]. Each Scan will suck out a sentence into a string and you keep calling the method until the scanner runs out of string.

Of course, the text has to be well punctuated.

TechZen
+2  A: 

I would use a scanner for it,

NSScanner *sherLock = [NSCanner scannerWithString:yourString]; // autoreleased
NSMutableArray *theArray = [NSMutableArray array]; // autoreleased
while( ![sherLock isAtEnd] ){
   NSString *sentence = @"";
   // . + a space, your sentences probably will have that, and you
   // could try scanning for a newline \n but iam not sure your sentences
   // are seperated by it
   [sherLock scanUpToString:@". " inToString:&sentence];
   [theArray addObject:sentence];
}

This should do it, there could be some little mistakes in it but this is how I would do it. You should lookup NSScanner in the docs though.. you might come across a method that is better for this situation.

Antwan van Houdt
A: 

How about:

NSArray *sentences = [string componentsSeparatedByString:@". "];

This will return an array("One","Two","Three") from a string "One. Two. Three."

Florian
How about "My friend Dr. Mark received a Ph.D. from St. Jude's."
Dewayne Christensen
Yeah, I tried that method after I posted the question. It worked better than just scanning for "." but not much.
Kenny Winker