I have a constantly growing database of keywords. I need to parse incoming text inputs (articles, feeds etc) and find which keywords from the database are present in the text. The database of keywords is much larger than the text.
Since the database is constantly growing (users add more and more keywords to watch for), I figure the best option will be to break the text input into words and compare those against the database. My main dilemma is implementing this comparison scheme (PHP and MySQL will be used for this project).
The most naive implementation would be to create a simple SELECT query against the keywords table, with a giant IN clause listing all the found keywords.
SELECT user_id,keyword FROM keywords WHERE keyword IN ('keyword1','keyword2',...,'keywordN');
Another approach would be to create a hash-table in memory (using something like memcache) and to check against it in the same manner.
Does anyone has any experience with this kind of searching and has any suggestions on how to better implement this? I haven't tried yet any of those approaches, I'm just gathering ideas at this point.