views:

505

answers:

7

I'm converting some C++ code to C# and it calls std::map::lower_bound(k) to find an entry in the map whose key is equal to or greater than k. However, I don't see any way to do the same thing with .NET's SortedDictionary. I suspect I could implement a workaround using SortedList, but unfortunately SortedList is too slow (O(n) for inserting and deleting keys). What can I do?

Note: I found a workaround using that takes advantage of my particular scenario... Specifically, my keys are a dense population of integers starting at just over 0, so I used a List<TValue> as my dictionary with the list index serving as the key, and searching for a key equal or greater than k can be done in only a few loop iterations. But it would still be nice to see the original question answered.

A: 

let's say you have something like this

Dictionary<string, int> dict = ...
\\and you have 
k \\- is the key to find or if it is not than at least greater
\\ you write

var entry = dict.Where(o => o.key >= k).First();
Omu
That doesn't quite work - it finds the first key at least equal to k, which may not be nearest. Setting that aside, the performance is too poor for my needs (it's O(N)).
Qwertie
("at least equal to k" should be "at least as large as k")
Qwertie
:) you said "to find an entry in the map whose key is equal to or greater than k."
Omu
Maybe I am suffering from post-lunch coma, but I don't see how the two are different: "at least equal to k" should be "at least as large as k"
Mathias
you can use the same technique to find the lower/equal bound, find the 2 distances between k and greater/lower and compare them, the one that is smallest or zero wins :)
Omu
that's in case you need the nearest and not just the greater or equal
Omu
A: 

find nearest to K:

dict.Keys.Where(i => i >= K).OrderBy(i => i).First();

or much faster:

public int? GetNearestKey(dict, K) 
{
    int? lowerK = null;
    foreach (int key in dict.Keys)
    {
        if (key == K) 
        {
            lowerK = K;
            break; 
        }
        else if (key >= K && (!lowerK.HasValue || key < lowerK))
        {
            lowerK = key;
        }
    }
    return lowerK;
}
najmeddine
er... now it's up from O(n) to O(n log n).
Qwertie
I need to do it in O(log n). Theoretically the SortedDictionary is capable of doing this, but I don't see an API for it.
Qwertie
+1  A: 

The problem is that a dictionary/hash table is designed to arrive at a unique memory location based on an input value, so you'll need a data structure that is designed to accommodate a range related to each value you store, and at the same time update each interval correctly

I think skip lists (or balanced binary trees) can help you. Although they cannot perform lookups in O(n), they can do logarithmically and still faster than trees.

I know this is not a proper answer since I cannot say that the .NET BCL already contains such a class, you'll unfortunately have to implement one yourself, or find a 3rd party assembly that supports it for you. There seems to be a nice example over at The CodeProject here, though.

Cecil Has a Name
SortedDictionary seems to be implemented with a red-black tree; too bad not all its capabilities are made public.
Qwertie
A: 

There isn't a binary search tree collection implementation in the base framework, so you'll either have to build one or find an implementation. As you noted, SortedList is closest in terms of searching but is slower (due to its underlying array implementation) for insertion/deletion.

Joe
SortedDictionary IS a binary search tree. Its public API just leaves out the searching functionality.
Qwertie
A: 

I think there's a mistake in the question about SortedList complexity.

SortedList has O(log(n)) amortized complexity for inserting new item. If you know in advance the capacity it can be done in O(Log(n)) in the worst case.

Elisha
Microsoft foolishly doesn't state the big-O complexity in the documentation ( http://msdn.microsoft.com/en-us/library/system.collections.sortedlist.aspx ) but it seems to imply that SortedList stores the keys and values in arrays. Sorted arrays have O(N) insert complexity if the keys being inserted are random.
Qwertie
It does,in http://msdn.microsoft.com/en-us/library/system.collections.sortedlist.add.aspx it says:"This method is an O(n) operation for unsorted data, where n is Count. It is an O(log n) operation if the new element is added at the end of the list. If insertion causes a resize, the operation is O(n)."
Elisha
A: 

You can try the code i wrote below. it using binary search, therefore assuming the list/array is pre-sorted.

public static class ListExtensions
{
    public static int GetAtMostIndex<TItem, TValue>(/*this*/ IList<TItem> list, TValue value, Func<TItem, TValue, int> comparer)
    {
        return GetAtMostIndex(list, value, comparer, 0, list.Count);
    }

    public static int GetAtLeastIndex<TItem, TValue>(/*this*/ IList<TItem> list, TValue value, Func<TItem, TValue, int> comparer)
    {
        return GetAtLeastIndex(list, value, comparer, 0, list.Count);
    }

    public static int GetAtMostIndex<TItem, TValue>(/*this*/ IList<TItem> list, TValue value, Func<TItem, TValue, int> comparer, int index, int count)
    {
        if (count == 0)
        {
            return -1;
        }

        int startIndex = index;
        int endIndex = index + count - 1;
        int middleIndex = 0;
        int compareResult = -1;

        while (startIndex < endIndex)
        {
            middleIndex = (startIndex + endIndex) >> 1; //  / 2
            compareResult = comparer.Invoke(list[middleIndex], value);

            if (compareResult > 0)
            {
                endIndex = middleIndex - 1;
            }
            else if (compareResult < 0)
            {
                startIndex = middleIndex + 1;
            }
            else
            {
                return middleIndex;
            }
        }

        if (startIndex == endIndex)
        {
            compareResult = comparer.Invoke(list[startIndex], value);

            if (compareResult <= 0)
            {
                return startIndex;
            }
            else
            {
                int returnIndex = startIndex - 1;

                if (returnIndex < index)
                {
                    return -1;
                }
                else
                {
                    return returnIndex;
                }
            }
        }
        else
        {
            //todo: test
            return startIndex - 1;
        }
    }

    public static int GetAtLeastIndex<TItem, TValue>(/*this*/ IList<TItem> list, TValue value, Func<TItem, TValue, int> comparer, int index, int count)
    {
        if (count == 0)
        {
            return -1;
        }

        int startIndex = index;
        int endIndex = index + count - 1;
        int middleIndex = 0;
        int compareResult = -1;

        while (startIndex < endIndex)
        {
            middleIndex = (startIndex + endIndex) >> 1; //  / 2
            compareResult = comparer.Invoke(list[middleIndex], value);

            if (compareResult > 0)
            {
                endIndex = middleIndex - 1;
            }
            else if (compareResult < 0)
            {
                startIndex = middleIndex + 1;
            }
            else
            {
                return middleIndex;
            }
        }

        if (startIndex == endIndex)
        {
            compareResult = comparer.Invoke(list[startIndex], value);

            if (compareResult >= 0)
            {
                return startIndex;
            }
            else
            {
                int returnIndex = startIndex + 1;

                if (returnIndex >= index + count)
                {
                    return -1;
                }
                else
                {
                    return returnIndex;
                }
            }
        }
        else
        {
            return endIndex + 1;
        }
    }
}
DxCK
Thanks for contributing this binary search algorithm but it would not have solved my problem, since it requires a sorted array. In my scenario (sorry for not being clear in the question), key inserts are interleaved with key queries. Maintaining the sort order of an array on EVERY insert (so that binary searches are possible) requires O(N) time. Thus, an array sorted by key would not have good performance. Now, if the array could be built in advance and then be followed by a series of queries, sorting would only have to be done once, which would be efficient. But that wasn't an option for me.
Qwertie
+2  A: 

It took a couple months of work, but at last I can offer at least a partial solution to this problem... I call it the Compact Patricia Trie, a sorted dictionary that offers a "find next larger key" operation.

http://www.codeproject.com/KB/recipes/cptrie.aspx

It's only a partial solution since only certain kinds of keys are supported, namely byte[], string, and all primitive integer types (Int8..UInt64). Also, string sorting is case-sensitive.

Qwertie