views:

35

answers:

2

The method I am doing right now is breaking the string into array of words in NSSet and minus the set of stopwords. Is there a more efficient way?

+1  A: 

A good regex library might do it faster.

Marcelo Cantos
you mean do a for loop and regex each term?
Unikorn
+1  A: 

NSRegularExpression is your friend.

Graham Lee
Thanks for replying! My suspicion was also NSRegularExpression, but the thought of matching a long string of word1|word2|word3... would still be efficient. I will give it try, thanks Lee!
Unikorn
@unikorn: uses a single regex rather than a loop. They're compiled down to an efficient representation before matching, so will be faster than you'll manage with a for loop (which also has to do the compilation N times)
Graham Lee
Yes. After a few trial last night, I ended up with a string of 500+ words to regex and surprised to find it was really fast. It parsed through roughly 100,000 lines of text in about 10 seconds. Thanks!
Unikorn