Hellow Stack Overflow people. I'd like some suggestions regarding the following problem. I am using Java.
I have an array #1 with a number of Strings. For example, two of the strings might be: "An apple fell on Newton's head" and "Apples grow on trees".
On the other side, I have another array #2 with terms like (Fruits => Apple, Orange, Peach; Items => Pen, Book; ...). I'd call this array my "dictionary".
By comparing items from one array to the other, I need to see in which "category" the items from #1 fall into from #2. E.g. Both from #1 would fall under "Fruits".
My most important consideration is speed. I need to do those operations fast. A structure allowing constant time retrieval would be good.
I considered a Hashset with the contains() method, but it doesn't allow substrings. I also tried running regex like (apple|orange|peach|...etc) with case insensitive flag on, but I read that it will not be fast when the terms increase in number (minimum 200 to be expected). Finally, I searched, and am considering using an ArrayList with indexOf() but I don't know about its performance. I also need to know which of the terms actually matched, so in this case, it would be "Apple".
Please provide your views, ideas and suggestions on this problem.
I saw Aho-Corasick algorithm, but the keywords/terms are very likely to change often. So I don't think I can use that. Oh, I'm no expert in text mining and maths, so please elaborate on complex concepts.
Thank you, Stack Overflow people, for your time! :)