views:

1655

answers:

2

Considering the code below:

Dataview someView = new DataView(sometable)
someView.RowFilter = someFilter;

if(someView.count > 0) {  …. }

Quite a number of articles which say Datatable.Select() is better than using DataViews, but these are prior to VS2008.

Solved: The Mystery of DataView's Poor Performance with Large Recordsets
Array of DataRecord vs. DataView: A Dramatic Difference in Performance

Googling on this topic I found some articles/forum topics which mention Datatable.Select() itself is quite buggy(not sure on this) and underperforms in various scenarios.

On this(Best Practices ADO.NET) topic on msdn it is suggested that if there is primary key defined on a datatable the findrows() or find() methods should be used insted of Datatable.Select().

This article here (.NET 1.1) benchmarks all the three approaches plus a couple more. But this is for version 1.1 so not sure if these are valid still now. Accroding to this DataRowCollection.Find() outperforms all approaches and Datatable.Select() outperforms DataView.RowFilter.

So I am quite confused on what might be the best approach on finding rows in a datatable. Or there is no single good way to do this, multiple solutions exist depending upon the scenario?

+4  A: 

Aseem,

You are looking for the "best approach on finding rows in a datatable", so I first have to ask: "best" for what? I think, any technique has scenarios where it might fit better then the others.

First, let's look at DataView.RowFilter: A DataView has some advantages in Data Binding. Its very view-oriented so it has powerful sorting, filtering or searching features, but creates some overhead and is not optimized for performance. I would choose the DataView.RowFilter for smaller recordsets and/or where you take advantage of the other features (like, a direct data binding to the view).

Most facts about the DataView, which you can read in older posts, still apply.

Second, you should prefer DataTable.Rows.Find over DataTable.Select if you want just a single hit. Why? DataTable.Rows.Find returns only a single row. Essentially, when you specify the primary key, a binary tree is created. This has some overhead associated with it, but tremendously speeds up the retrieval.

DataTable.Select is slower, but can come very handy if you have multiple criteria and don't care about indexed or unindexed rows: It can find basically everything but is not optimized for performance. Essentially, DataTable.Select has to walk the entire table and compare every record to the criteria that you passed in.

I hope you find this little overview helpful.

I'd suggest to take a look at this article, it was helpful for me regarding performance questions. This post contains some quotes from it.

A little UPDATE: By the way, this might seem a little out of scope of your question, but its nearly always the fastest solution to do the filtering and searching on the backend. If you want the simplicity and have an SQL Server as backend and .NET3+ on client, go for LINQ-to-SQL. Searching Linq objects is very comfortable and creates queries which are performed on server side. While LINQ-to-Objects is also a very comfortable but also slower technique. In case you didn't know already....

best regards, thomas

moonground.de
+1  A: 

Moonground's post sums it up nicely:

  • DataView.RowFilter is for binding.
  • DataTable.Rows.Find is for searching by primary key only.
  • DataTable.Select is for searching by multiple columns and also for specifying an order.

Avoid creating many DataViews in a loop and using their RowFilters to search for records. This will drastically reduce performance.

I wanted to add that DataTable.Select can take advantage of indexes. You can create an index on a DataTable by creating a DataView and specifying a sort order:

DataView dv = new DataView(dt, null, "Col1, Col2", DataViewRowState.CurrentRows);

Then, when you call DataTable.Select(), it can use this index when running the query. We have used this technique to seriously improve performance in places where we use the same query many, many times. (Note that this was before Linq existed.)

The trick is to define the sort order correctly for the Select statement. So if your query is "Col1 = 1 and Col2 = 4", then you'll want "Col1, Col2" like in the example above.

Paul Williams