views:

13417

answers:

7

I have a DataTable/collection that is cached in memory, I want to use this as a source to generate results for an auto complete textbox (using AJAX of course). I am evaluating various options to fetch the data quickly. The number of items in the collection/rows in the datatable could vary from 10000 to 2,000,000. (So that we dont get diverted, for the moment assume that the decision has been made, I have ample RAM and I will be using the cache and not database query for this)

I have some additional business logic for this processing; I have to prioritize the auto complete list as per a priority column (int) in the collection. So if I someone searches for Micro and I get say 20 results of words/sentences that start with Micro then I would pick the top 10 resultant items with highest priority. (hence the need to have a priority property associated with the string value).

The collection items are already sorted alphabetically.

What would be the best solution in this case.
1. Using DataTable.Select(.
2. Using DataTable.Rows.Find(.
3. use a custom collection with foreach or for to iterate through its values.
4. use a generic collection with anonymous delegates or lambda (since both give same performance or not?)

A: 

We could speculate about it all day, but since this is not a huge piece of code, why not write each one and benchmark them against each other?

public delegate void TestProcedure();

public TimeSpan Benchmark(TestProcedure tp)
{
    int testBatchSize = 5;
    List<TimeSpan> results = new List<TimeSpan>();
    for(int i = 0; i<testBatchSize; i++)
    {
        DateTime start = DateTime.Now;
        tp();
        results.Add(DateTime.Now - start);
    }
    return results.Min();
}
Barry Fandango
Well just checking if anyone has already `been there done that`!
Binoj Antony
If you do benchmark it, let us know! I'm curious what you will find.
Barry Fandango
A: 

I don't know how 1. or 2. are implemented under the covers, but use can use the fact that you KNOW the values are sorted by using a binary search

Binary Worrier
Binary search by Binary Worrier!
Binoj Antony
Thats me dude :)
Binary Worrier
A: 

What about a DataView? You could apply your filter condition AND sort by the the priority, and easily iterate through the results to add to your results.

Dillie-O
Yes option 2 in the question does exactly this.
Binoj Antony
+3  A: 

On my autocomplete, i tried first the linq/lambda approach, the performance is a little slow. DataTable.Select is faster than linq, so I use this. I haven't yet compared the performance between datatable.Select and datatable.Find

Michael Buen
A: 

Hi, as per the following blog

http://blog.dotnetspeech.net/archive/2008/08/26/performance----datatable.select-vs-dictionary.aspx

DataTable.Rows.Find is much, much faster than DataTable.Select.

+2  A: 

The charts aren't posted on my blog entry; more details can be found at http://msdn.microsoft.com/en-us/library/dd364983.aspx

One other thing that I've since discovered is that, for large data sets, using a chained generic dictionary performs incredibly well. It also helps alleviate many of the issues caused by the sort operations required for aggregation operations such as min and max (either with DataTable.Compute or LINQ).

By "chained generic dictionary," I mean a Dictionary(Of String, Dictionary(Of String, Dictionary(Of Integer, List(Of DataRow)))) or similar technique, where the key for each dictionary is a search term.

Granted, this won't be useful in all circumstances, but I have at least one scenario where implementing this approach lead to a 500x performance improvement.

In your case, I'd consider using a simple dictionary with the first 1-5 characters, then a List(Of String). You'd have to build up this dictionary once, adding the words to the lists with the first 1-5 characters, but after that you'll be able to get blazingly fast results.

I generally wrap things like this in a class that allows me to do things like add words easily. You may also want to use a SortedList(Of String), to get the results sorted automagically. This way, you can quickly look up the list of words that match the first N characters that have been typed.

Jeff Certain
And you are quite `Certain` about that!
Binoj Antony
A: 

In order of performance (best on top) 2 1 3 (with a for loop, more for a foreach) 4

Number 4 is the slowest because LINQ has much more overhead associated with it and is by far more complex than a datatable object.

Number 3 is slightly better because a customcollection has a much smaller footprint. A foreach loop is just logic laid on top of a for loop, so it will be slower.

Number 1/2 Another user has answered this already.

Just be careful with AJAX, it does not perform well with large sets of data. You should have some logic around the filtering that limits the amount of records it can look at to filter.

subt13