views:

189

answers:

6

When returning data from a DataReader I would typically use the ordinal reference on the DataReader to grab the relevant column:

if (dr.HasRows)         
   Console.WriteLine(dr[0].ToString());

(OR dr.GetString(0); OR (string)dr[0];)...

I have always done this because I was advised at an early stage that using dr["ColumnName"] or a more elegant way of indexing causes a performance hit.

However, while all references to data entities are becoming increasingly strongly-typed I feel more uncomfortable with this. I'm also aware that the above does not check for DBNull.

What is the most robust way to return data from a DataReader?

A: 

The problem with ordinal is if the order of the columns change and you are forced to modify the consumer code for the DataReader, unlike using column names.

I dont think there is a performance gain when using ordinal or column names, it is more on best practice and coding standards and code maintainability really

J Angwenyi
+3  A: 

I always go with the string name approach just because reading the code is cleaner. Having to mentally parse index to column name is horrible.

Tejs
Using the column name eliminates "magic numbers" also and makes the programmers job easier. Using the ordinal is a premature micro-optimization.
Malfist
+1. Column names changes more rare than their order
abatishchev
There are two parts to this issue, the performance aspect and the maintainability/readability. String names don't scale very well. @Malfist labeling something a "premature micro-optimization" is relative to your code. If your code is high performance, then this could absolutely be the bottleneck.
Josh
+2  A: 

I do think that indexed fields is the better approach, if it would only be for avoiding field names changes from the underlying database, which would require a recompile of your application because you hardcoded the field name.

For each of the fields, you will need to manually check for nulls.

var dr = command.ExecuteQuery();

if (dr.HasRows) {
    var customer = new Customer();
    // I'm assuming that you know each field's position returned from your query.
    // Where comes the importance to write all of your selected fields and not just "*" or "ALL".
    customer.Id = dr[0] == null || dr[0] == DBNull.Value ? null : Convert.ToInt32(dr[0]);
    ...
}

In addition to it, it would allow you to use reflection and make this "GetData()" method more generic providing a typeof(T) and getting the proper constructor for the proper type. The binding to the order of each columns is the only one thing some wish to avoid, but in this case, it becomes worthy.

Will Marcouiller
+2  A: 

Indexing the data reader by name is slightly more expensive. There are two main reasons for this.

  • Typical implementations store field information in a data structure that uses numerical indexing. A mapping operation has to take place to transpose a name into a number.
  • Some implementations will do a two-pass lookup on the name. The first pass tries to match field names with case-sensitivity turned on. If that pass fails then a second pass begins with case-sensitivity turned off.

However, in most cases the performance penalty incurred by looking up a field by name is dwarfed by the amount of time it takes for the database to execute the command. Do not let the performance penalty dictate your choice between name and numeric indexing.

Despite the slight performance penalty I normally choose name indexing for two reasons.

  • The code is easier to read.
  • The code is more tolerant to changes in the schema of the resultset.

If you feel like the performance penalty of name indexing is becoming problematic (maybe the command executes quickly, but returns a lot of rows) then lookup the numeric index once by name, save it away, and use it for the remaining rows.

Brian Gideon
+4  A: 

The string name lookup is much more expensive than the ordinal call, but more maintainable and less "fragile" than hard coding the ordinals. So here's how I've been doing it. It's the best of both worlds. I don't have to remember the ordinal values or care if the column order changes, but I get the performance benefits of using ordinals.

var dr = command.ExecuteQuery();
if (dr.HasRows)
{
    //Get your ordinals here, before you run through the reader
    int ordinalColumn1 = dr.GetOrdinal("Column1");
    int ordinalColumn2 = dr.GetOrdinal("Column2");
    int ordinalColumn3 = dr.GetOrdinal("Column3");

    while(dr.Read())
    {
        // now access your columns by ordinal inside the Read loop. 
        //This is faster than doing a string column name lookup every time.
        Console.WriteLine("Column1 = " + dr.GetString(ordinalColumn1);
        Console.WriteLine("Column2 = " + dr.GetString(ordinalColumn2);
        Console.WriteLine("Column3 = " + dr.GetString(ordinalColumn3);
    }
}

Note: this only really makes sense for readers you expect to have a decent number of rows in. The GetOrdinal() calls are extra, and only pay for themselves if your combined savings from calling GetString(int ordinalNumber) in the loop are greater than the cost of calling GetOrdinal.

Edit: missed the second part of this question. Regarding DBNull values, I've started writing extension methods that deal with that possibility. example: dr.GetDatetimeSafely() Within those extension methods, you can do whatever you need to feel confident you get the expected value back.

Josh
+3  A: 

It is possible to argue both sides in this situation. As already pointed out by others, using the name is more readable and will not break if someone changes the order of columns in the underlying database. But one might also argue the case that using an ordinal has the advantage of not breaking if someone changes the column name in the underlying database. I prefer the former argument, though, and think the readability argument for column names trumps the second argument in general. And an additional argument for names is that it is that it can “self-detect” errors. If someone does change a field name, then the code has a better chance of breaking rather than having the subtle bug of appearing to work while it reads the wrong field.

It seems obvious but maybe it is worth mentioning a usage case that has both the self-detecting error and the performance of ordinals. If you specify the SELECT list explicitly in the SQL, then using ordinals won’t be a problem because the statement in the code guarantees the order:

SELECT name, address, phone from mytable

In this case, it would be fairly safe to use ordinals to access the data. It doesn’t matter if someone moves fields around in the table. And if someone changes a name, then the SQL statement produce an error when it runs.

And one final point. I just ran a test on a provider I helped write. The test read 1 million rows and accessed the “lastname” field on each record (compared against a value). The usage of rdr[“lastname”] took 3301 milliseconds to process while rdr.GetString(1) took 2640 milliseconds (approximately a 25% speedup). In this particular provider, the lookup of the name uses a sorted lookup to translate the name to ordinal.

Mark Wilkins
You have some good points here.
Brian Gideon
+1 for giving metrics.
David Neale