I need a fast method to determine if a given string is in a list of strings.
The list of strings is not known until runtime but thereafter it will not change.
I could simply have a List<String>
called strings
and then do:
if (strings.Contains(item))
However this will perform poorly if there are many strings in the list.
I could also use a HashSet<String>
, but this would necessitate calling GetHashCode
on every incoming string as well as Equals
, which would be a waste if there are e.g. only 3 strings in the list. Did I mention this needs to be fast?
I could when setting up, decide to use a List
or a HashSet
depending on the number of strings (e.g. use List for less than 10 strings, HashSet otherwise), rather like the logic in HybridDictionary
.
As the strings are unicode, a standard Trie structure will not work, although a Radix tree/Patricia trie might. Are there any good C# implementations out there with benchmarks?
Some have mentioned bypassing String
's GetHashCode
and using a faster performing hash function. Are there any benchmarks out there?
Using LINQ expressions to essentially create an optimised switch statement is a novel approach which looks very interesting.
What else would work? Setup cost is not important, just the search speed.
If it matters, the incoming string values will rarely appear in the list.