The method I am doing right now is breaking the string into array of words in NSSet and minus the set of stopwords. Is there a more efficient way?
you mean do a for loop and regex each term?
Unikorn
2010-10-17 11:09:51
Thanks for replying! My suspicion was also NSRegularExpression, but the thought of matching a long string of word1|word2|word3... would still be efficient. I will give it try, thanks Lee!
Unikorn
2010-10-17 11:38:33
@unikorn: uses a single regex rather than a loop. They're compiled down to an efficient representation before matching, so will be faster than you'll manage with a for loop (which also has to do the compilation N times)
Graham Lee
2010-10-17 13:59:16
Yes. After a few trial last night, I ended up with a string of 500+ words to regex and surprised to find it was really fast. It parsed through roughly 100,000 lines of text in about 10 seconds. Thanks!
Unikorn
2010-10-18 02:41:45