tags:

views:

84

answers:

5

In a DataTable object, is there added overhead to looking up a column value by name thisRow("ColumnA") rather than by the column index thisRow(0)? In which scenarios might this be an issue.

I work on a team that has lots of experience writing VB6 code and I noticed that didn't do column lookups by name for DataTable objects or data grids. Even in .NET code, we use a set of integer constants to reference column names in these types of objects. I asked our team lead why this was so, and he mentioned that in VB6, there was a lot of overhead in looking up data by column name rather than by index. Is this still true for .NET?


Example code (in VB.NET, but same applies to C#):

Public Sub TestADOData()
Dim dt As New DataTable

'Set up the columns in the DataTable    '
dt.Columns.Add(New DataColumn("ID", GetType(Integer)))
dt.Columns.Add(New DataColumn("Name", GetType(String)))
dt.Columns.Add(New DataColumn("Description", GetType(String)))

'Add some data to the data table    '
dt.Rows.Add(1, "Fred", "Pitcher")
dt.Rows.Add(3, "Hank", "Center Field")

'Method 1: By Column Name   '
For Each r As DataRow In dt.Rows
  Console.WriteLine( _
   "{0,-2} {1,-10} {2,-30}", r("ID"), r("Name"), r("Description"))
Next

Console.WriteLine()

'Method 2: By Column Name   '
For Each r As DataRow In dt.Rows
  Console.WriteLine("{0,-2} {1,-10} {2,-30}", r(0), r(1), r(2))
Next

End Sub

Is there an case where method 2 provides a performance advantage over method 1?

+5  A: 

Yes, there should be a slight overhead connected to looking up columns by name instead of by index. I wouldn't worry about it, unless you keep looking up that same column in a loop, like in your code example. Because then the slight overhead might accumulate to a measurable overhead, depending on the table's number of rows.

The fastest way to access a particular column's value of some row is to lookup using the DataColumn object itself. For example:

Dim dt As DataTable = ...

Dim idColumn As DataColumn = dt.Columns("ID")
Dim nameColumn As DataColumn = dt.Columns("Name")
Dim descriptionColumn As DataColumn = dt.Columns("Description")

For Each r As DataRow In dt.Rows

    ' NB: lookup through a DataColumn object, not through a name, nor an index: '
    Dim id = r(idColumn)
    Dim name = r(nameColumn)
    Dim description = r(descriptionColumn)

    ...
Next

One last piece of advice: I would strongly advise you against using numerical indices! It makes your code more fragile, and also more difficult to understand and maintain: As soon as the logical order of a column changes, you need to adapt your code accordingly, possibly in several places (and you might easily oversee one of them, leading to bugs). If you instead use column names or DataColumn objects themselves for the lookup, you can change your columns' order without having to change the remaining code.

stakx
+1 And Ben, you can tell your team lead this is **also good advice for VB6**. For instance this Microsoft *SQL Server Magazine* article from 2001 http://msdn.microsoft.com/en-us/library/aa496013(SQL.80).aspx
MarkJ
@MarkJ That article has a lot of good insight. In my own tests in .NET, the cost of looking up by name was relatively small with a difference of only about 300ms per 1,000,000 records. Using column names so much more readable and maintainable that I'm not sure I would want to look up by DataColumn unless I knew I was going to be working with very large iterations of data. But I guess that's what this question is seeking to find out: how much better is one method over another?
Ben McCormack
@Ben, Yes. I guess my point is that (IMHO) your team lead wasn't using the best techniques even in VB6. I don't think the situation has changed in VB.Net. Personally I too would use column names everywhere, except if testing proved there was unacceptable performance in a particular bit of code. (Regardless of whether it's VB6 or VB.Net)
MarkJ
+1  A: 

Yes, there is overhead with looking up a column by name rather than absolute index (as it simply locates the column, then accesses it that way).

That being said, that's fairly premature optimization. The DataTable will first try to locate the column in a case-sensitive manner, which is very fast. If it can't locate the column that way, it looks in a case-insensitive manner, which is only slightly slower.

The absolute fastest way to access the data is via the DataColumn object itself, as that's what both the index-based and name-based accessors use.

Adam Robinson
A: 

The best approach would be to find the index by column name and use the index for locating column:

Dim table As DataTable = ...

Dim foo As int = table.Columns("Foo")

For Each row As DataRow In table.Rows

    Dim data = row(foo)

Next

You are looking by index and you can also guess from name which column you are reading. Another advantage is that if you change order of fields in your select query you will still get the correct value. On the other hand if you hard code indexes your code will break.

Giorgi
+2  A: 

The facts are

  1. Getting the column by index is a direct index into an ArrayList
  2. If the name lookup is case sensitive a hashtable lookup is performed
  3. If the name lookup is case insensitive then the list is scanned for the name and a more costly locale sensitive string comparison is performed.

So definately the retrieval by index will be the most performant, but does it matter?

Doing some basic (read naive) tests on my machine I found that 1,000,000 accesses to a column using the mentioned indexing mechanism take the following times

  1. Direct index - 13.3ms
  2. Case insensitive lookup - 109.11ms
  3. Case sensitive lookup - 109.24ms

So depending on your scenario you can draw your conclusions.

Chris Taylor
A: 

Depends on the place you use it. On desktop application startup these kind of small inefficiencies may accumulate to long delays. On mouse and keyboard events most likely not. It will most likely be more efficient to spend time on function profiling (printing out execution times to dbgview), than optimizing this kind of low-level stuff.

AareP