Let's say I have a set of keywords in an array {"olympics", "sports tennis best", "tennis", "tennis rules"}
I then have a large list (up to 50 at a time) of strings (or actually tweets), so they are a max of 140 characters.
I want to look at each string and see what keywords are present there. In the case where a keyword is composed of multiple words like "sports tennis best", the words don't have to be together in the string, but all of them have to show up.
I've having trouble figuring out an algorithm that does this efficiently.
Do you guys have suggestions on a way to do this? Thanks!
Edit: To explain a bit better each keyword has an id associated with it, so {1:"olympics", 2:"sports tennis best", 3:"tennis", 4:"tennis rules"}
I want to go through the list of strings/tweets and see which group of keywords match. The output should be, this tweet belongs with keyword #4. (multiple matches may be made, so anything that matches keyword 2, would also match 3 -since they both contain tennis).
When there are multiple words in the keyword, e.g. "sports tennis best" they don't have to appear together but have to all appear. e.g. this will correctly match: "i just played tennis, i love sports, its the best"... since this string contains "sports tennis best" it will match and be associated with the keywordID (which is 2 for this example).
Edit 2: Case insensitive.