views:

70

answers:

5

I need to create a dictinary where key is string and value is Object. But I don't want exact match of the key with user provided string. Instead I want to key to contain a part of string. Let me explain by example

If there is an entry in dictionary under key "Johnson" I want to be able to find value given input strings "John", "Jo". Also I want to be able to extract several values that match input string by given condition. For instance if there entries "John A" and "John B" I want to to have functionality like FindFirst that would return iterator to first matched value.

Ideally I would prefer use existing System.Collections.Generic.Dictionary possibly deriving a new class and overriding some methods

+2  A: 

Although I doubt whether a dictionary is appropriate for something like this, you can use:

dictionary[dictionary.Keys.First(d=>d.StartsWith("Jo"))]

You lose most of the value of a dictionary here, as it's optimized for quick retrieving of the value, by using the key. In this case you'll have to iterate over every key in the list.

I'll have to +1 Jon for pointing out SortedList<TKey,TValue>

Jan Jongboom
+3  A: 

I suspect that SortedList<TKey, TValue> is going to be your best bet here, which is a dictionary based on a binary search tree. Its Keys property returns an IList<TKey> with O(1) access time.

You'd fetch the Keys property and perform a binary search to find a key which starts with your search key. Then look up and down from that sample key to find the range of keys which actually match. This will give O(log n) performance rather than the O(n) performance you'd get from looking through all keys.

I wouldn't derive from this though - I'd write a type which has a SortedList<,> internally.

Jon Skeet
You're absolutely right in that SortedList (as opposed to SortedDictionary) will help with finding ranges by allowing O(1) operations on the Keys collection. Even more efficient would probably be to just have a List<KeyValuePair<string, object>> that is sorted and perform a binary search on that, to avoid having to look up values by key.
Freed
Though of course the Values collection can be accessed by index (same index as the keys) so it's not really an issue. Never mind.
Freed
+1  A: 

You can use custom equality comparisons with Dictionary by providing an implementation of IEqualityComparer. However, Dictionary is a hash map and needs a mapping from every key to the same integer hash, which makes it less useful in your case. You could use SortedDictionary (which is also IDictionary), providing a custom IComparer and get O(log(n)) lookup time (instead of O(1) that Dictionary ideally can provide).

Freed
Note to self: This doesn't solve the issue with multiple matches, the SortedDictionary will only return the first "equal" key.
Freed
+1  A: 

You can use a normal dictionary and provide your own comparer, look at generic dictionary, in particular the section that talks about providing your own comparer.

The main issue is that you'll essentially have to compare against all keys till you find a match, since you're using custom rules, so make sure your custom comparer exits quickly if it can't match (e.g. starts with a different letter).

Timothy Walters
What did you intend GetHashCode to return for this scenario?
Freed
+1  A: 

I think you should consider separating the search for the appropriate key from the accessing of the underlying record.

That you, for instance, have a btree+ structure of mere keys wherein you locate the first matching record, then you follow the btree+ enumerator until you have no match.

Similar to a nonclustereded index in a database. First you find the key, then you find the record.

Your examples "Jo" and "John" within "Johnson" are examples of "StartsWith()", where the key sorting will benefit you. If you are also expecting to be looking for a mere substring as opposed to only the initial segment, you need to look at other algorithms of storing and finding the key.

If you are not positive that you both need, and will be able to leverage, optimized search, you should just do an in-memory scan over all keys and then focus on optimizing the matching predicate. For example by using the Regex option of precompiling the search.

Tormod
How is btree relevant to in-memory searches?
Freed