Hi there
Suppose I have a database containing 500,000 records, each representing, say, an animal. What would be the best approach for parsing 140 character tweets to identify matching records by animal name? For instance, in this string...
"I went down to the woods to day and couldn't believe my eyes: I saw a giant polar bear having a picnic with a red squirrel."
... I would like to flag up the phrases "giant polar bear" and "red squirrel", as they appear in my database.
This strikes me as a problem that has probably been solved many times, but from where I'm sitting it looks prohibitively intensive - iterating over every db record checking for a match in the string is surely a crazy way to do it.
Can anyone with a comp sci degree put me out of my misery? I'm working in C# if that makes any difference. Cheers!