views:

812

answers:

12

When retrieving values from a DataRow is it better to use the column name or column index?

The column name is more readable and easier to maintain:

int price = (int)dr["Price"];

While column index is just faster (I think):

int price = (int)dr[3];

Would using column names break if you decide to obfuscate the database?

+3  A: 

I would think the column name is the best way to go. It is easier to determine what you are pulling, and the column order is determined by the select statement which could change sometime down the road. You could argue the column name could change too, but i would think this would be much less likely.

EDIT:

Actually if you were really bent on using column indexes you could create constants of the column indexes and name the constant the name of the column. So:

PRIMARY_KEY_COLUMN_NAME_INDEX = 0

That would at least make it readable.

Kevin
You should make a variable for the string as well.
Aaron Fischer
Actually, they should probably be constants instead of variables. I would also debate the merit of doing that in certain instances, especially if the dataset is only going to be accessed in one spot. If that changes later then it can be refactored.
Kevin
+8  A: 

I generally prefer readability and understanding over speed. Go with the name. You could (should) use string constants that can be updated in one place if you decide to change database column names.

tvanfosson
+1  A: 

If you did decide to obfuscate the database by changing column names in the future, you could alias those columns in your query to keep the indexer code functional. I suggest indexing by name.

mquander
A: 

i agree with kevin. using the name will make you code easier to understand and less prone to errors.

Sergio
+1  A: 

Go with the name, you get better error messages :)

Jan Bannister
+1  A: 

It depends on what you need. In my case, I had a situation where speed was paramount as I was performing intense processing on thousands of rows in a DataSet, so I chose to write a piece of code that cached the column indexes by name. Then, in the loop code I used the cached indexes. This gave a reasonable performance increase over using the column name directly.

Your mileage may vary, of course. My situation was a rather contrived and unusual case, but in that instance it worked rather well.

Andrew Rollings
A: 

for me, I'm using reflection(not sure it's the correct way to name what I do) to get the columnnameColumn from the table

no "hardcoding" is better

  int price = (int)dr[DatableVar.PriceColumn];
Fredou
But you're still assuming that the table will contain certain columns, right? How is that better?
Kevin Tighe
I'm using a dataset schemas, if something change at least I will know where to look since I would get error at runtime, not execution time.
Fredou
+6  A: 

Accessing columns/row values via column names is better for human-reading and for forward-compatibility (if in future someone change order or count of columns).

Accissing columns/row values via column indeces is better for performance.

So, if you want change some value in one/two/..... rows, the column names are ok. But if you want change some value in thousands of rows, you should use column index computed from column name:

int ndxMyColumn = table.Columns.IndexOf( "MyColumn" );
foreach(DataRow record in table.Rows ) {
    record[ndxMyColumn] = 15;
}
TcKs
+1  A: 

My opinion is that you should only switch to indices if you profiled your code and it showed as the bottleneck. I don't think this will happen.

Naming stuff is good, it makes our limited brain understand problems and build links easier. That's why we are given names such as Fred, Martin, Jamie, rather than Human[189333847], Human[138924342] and Human[239333546].

Coincoin
+1  A: 

I opt for strings for ease of reading and maintainability. I use string contstants to define the values of the column names. Ex:

public class ExampleDataColumns
{
    public const string ID = "example_id";
    public const string Name = "example_name";
    ....    
}

Then I can reference it later like this:

row[ExampleDataColumns.ID]
Jim Petkus
+1  A: 

Use column names for DataRow by the same token that an RDBMS won't gain speed by requiring programmers to specify the column index in SQL. But you can perhaps mimic the way an RDBMS operate when you issue a SELECT statement, inside an RDBMS engine it query the column index/offset of columns specified in SELECT clause before it traverse the rows, so it can operate faster.

If you really want to gain speed, don't do it the const/enum way (column order might change on your database or ORM layer). Do it as TcKs suggested(before the actual loop):

int ndxMyColumn = table.Columns.IndexOf( "MyColumn" );
foreach(DataRow record in table.Rows ) {
    record[ndxMyColumn] = 15;
}
Michael Buen
+1  A: 

Completely agress with others re. go for readability and maintainability over speed. I however had a generic method which needed to get named columns passed in as parameters so it made sense to work out what there column indices were.

In the benchmarking below using column index showed a big improvement so if this is a bottleneck area or a performance critical part of your code it may be worthwhile.

The output from the code below is:

515ms with ColumnIndex

1031ms with ColumnName

    static void Main(string[] args)
    {            
        DataTable dt = GetDataTable(10000, 500);
        string[] columnNames = GetColumnNames(dt);

        DateTime start = DateTime.Now;
        TestPerformance(dt, columnNames, true);

        TimeSpan ts = DateTime.Now.Subtract(start);
        Console.Write("{0}ms with ColumnIndex\r\n", ts.TotalMilliseconds);

        start = DateTime.Now;
        TestPerformance(dt, columnNames, false);
        ts = DateTime.Now.Subtract(start);
        Console.Write("{0}ms with ColumnName\r\n", ts.TotalMilliseconds);
    }

    private static DataTable GetDataTable(int rows, int columns)
    {
        DataTable dt = new DataTable();

        for (int j = 0; j < columns; j++)
        {
            dt.Columns.Add("Column" + j.ToString(), typeof(Double));
        }

        Random random = new Random(DateTime.Now.Millisecond);
        for (int i = 0; i < rows; i++)
        {
            object[] rowValues = new object[columns];
            for (int j = 0; j < columns; j++)
            {
                rowValues[j] = random.NextDouble();
            }
            dt.Rows.Add(rowValues);
        }

        return dt;
    }

    private static void TestPerformance(DataTable dt, string[] columnNames, bool useIndex)
    {
        object obj;
        DataRow row;

        for (int i =0; i < dt.Rows.Count; i++)
        {
            row = dt.Rows[i];

            for(int j = 0; j < dt.Columns.Count; j++)
            {
                if (useIndex)
                    obj = row[j];
                else
                    obj = row[columnNames[j]];
            }
        }
    }

    private static string[] GetColumnNames(DataTable dt)
    {
        string[] columnNames = new string[dt.Columns.Count];
        for (int j = 0; j < columnNames.Length; j++)
        {
            columnNames[j] = dt.Columns[j].ColumnName;
        }
        return columnNames;
    }
Charlie Openshaw