views:

146

answers:

3

What is the fastest way to implement something like this in C#:

  private List<string> _myMatches = new List<string>(){"one","two","three"};
  private bool Exists(string foo) {
      return _myMatches.Contains(foo);
  }

note, this is just a example. i just need to perform low level filtering on some values that originate as strings. I could intern them, but still need to support comparison of one or more strings. Meaning, either string to string comparison (1 filter), or if string exists in string list (multiple filters).

A: 

Hash tables are your friends for fast string lookups.

Have a look at a good tutorial at Working with HashTable in C# 2.0

Elf King
Better off using HashSet<T>, since it's type safe, and designed exactly for this type of operation.
Reed Copsey
:-) right you are
Elf King
+14  A: 

You could make this faster by using a HashSet<T>, especially if you're going to be adding a lot more elements:

private HashSet<string> _myMatches = new HashSet<string>() { "one", "two", "three" };

private bool Exists(string foo)
{
    return _myMatches.Contains(foo);
}

This will beat out a List<T> since HashSet<T>.Contains is an O(1) operation.

List<T>'s Contains method, on the other hand, is O(N). It will search the entire list (until a match is found) on each call. This will get slower as more elements are added.

Reed Copsey
+1 - I'd go with this one. Doesn't look like HashTables are as appropriate given that it's not storing pairs of info. My immediate thought was dictionary, but I discarded it for the same reason as I discarded the HashTable.
BenAlabaster
thanks, do you think interning the strings would help?
Sonic Soul
A couple of quibbles: this assumes that structure should only contain unique values, and that filling time and memory consumption are irrelevant.
Conrad Frix
@Sonic Soul - If they are literals like in the code samples here, then they will be interned already. Interning is generally a memory optimization rather than a speed optimization, though if you have enough duplication that it prevents excessive paging, it may help speed too. If you have a lot of strings in your set, remember that interning means that the strings probably won't be released until the CLR terminates.
BioBuckyBall
A: 

You are going to have to profile. And do you mean fastest lookup (ie, does initialization time count)?

@Elf King already mentioned Hash Tables, which I was going to point you at (specifically HashSet<T>)

BioBuckyBall
Yep! HashSet sounds like a good thing!
Elf King
initialization doesn't matter, just lookup. thanks
Sonic Soul