views:

394

answers:

2

For one NSString, I have N pattern strings. I'd like to extract substrings "around" the pattern matches.

So, if i have "the quick brown fox jumped over the lazy dog" and my patterns are "brown" and "lazy" i would like to get "quick brown fox" and "the lazy dog." However, the substrings don't necessarily need to be delimited by whitespace.

Another example would be if you had multiple paragraphs of text and wanted to find all instances of "red" and "blue" in the text, but you wanted to show the instances of "red" and "blue" in context, but by "context" you didn't care if the context started and ended with the beginnings or endings of words in the body of text, so if you had one of the sentences in the body of text as "there are a whole lot of red ducks in the trees" the result could be "whole lot of red ducks in" or "ole lot of red ducks in th" and it wouldn't matter -- i'm not looking for a whitespace based solution. it could just be to find "red" and get the substring that is "red" and the 10 characters before and the 10 characters after.

In other words, there are some "range" based string matching functions. I was hoping there was an easy way to match multiple strings at once and return each string's matching point plus surrounding characters.

+3  A: 

I think what you want is NSScanner. To find an arbitrary string within a larger string, you do something like:

 NSString *scannedString = nil;
 NSScanner *scanner = [NSScanner scannerWithString:@"The quick brown fox jumped over the lazy dog"];
 [scanner scanUpToString:@"brown" intoString:&scannedString];
 // scannedString is now @"The quick " and the scanner's location is right before "brown"

To get the context, you'll need to decide how much around the location where "brown" was found you want to include in your result.

As an alternate solution when you're always looking for words, you could use NSString's componentsSeparatedByString: to get an array and then return the element + x many elements around it. For example:

 NSArray *words = [@"The quick brown fox jumped over the lazy dog" componentsSeparatedByCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet];
 NSUInteger wordLocation = [words indexOfObject:@"brown"];
 NSString *wordInContext = [[words subarrayWithRange:NSMakeRange(brownlocation-2, brownLocation+2)] componentsJoinedByString:@" "];

(All the examples here are lacking necessary error checking, but it's just to give you an idea of ways you can do things like this.)

Chuck
+3  A: 

You could use regular expressions provided by a third party framework (e.g. RegexKit or RegexKitLite). To create the RE, join the patterns with "|" and prepend and append parentheses and patterns to capture context. Match the string against the regexp.

Some example prefix & suffix patterns:

  • ".{,15}(", ").{,15}" to match up to 15 characters
  • "(\w+\W+){,4}(", ")(\W+\w+){,4}" to match up to 4 words
outis
Adding my vote for RegexKitLite. You can get results as an array of matching capture components, or an array of ranges which tell you where in the string the matched components are. It should be able to do everything you're asking for.
Victorb