Okay, to respond to your edit (and basically lifting my comment into an answer):
Specify in advance the performance that you are expecting.
Code your application against a sorted array and using a binary search to search the array for a keyword. This is very simple to implement and gives decent performance. Then profile to see if it matches the performance that you demand. If this performance is acceptable, move on. The worst-case performance here is O(m log n)
where n
is the number of keywords and m
is the maximum length of your keywords.
If the performance in step two is not acceptable, use a trie (also known as a prefix tree). The expected performance here is m
where m
is the maximum length of your keywords. Profile to see if this meets your expected performance. If it does not, revisit your performance criteria; they might have been unreasonable.
If you are still not meeting your performance specifications, consider using a hashtable (in .NET you would use a HashSet<string>
. While a hashtable will have worse worst-case performance, it could have better average case performance (if there are no collisions a hashtable lookup is O(1)
while the hash computing function is O(m)
where m
is the maximum length of your keywords). This might be faster (on average) but probably not noticeably so.
You might even consider skipping directly to the last step (as it's less complex than the former). It all depends on your needs. Tries have the advantage that you can easily spit out the closest matching keyword, for example.
The important thing here is to have a specification of your performance requirements and to profile! Use the simplest implementation that meets your performance requirements (for maintainability, readability and implementability (if it's not, it's a word now!))