views:

1652

answers:

11

When using JDBC, I often come across constructs like

ResultSet rs = ps.executeQuery();
while (rs.next()) {
    int id = rs.getInt(1);
    // Some other actions
}

I asked myself (and authors of code too) why not to use labels for retrieving column values:

int id = rs.getInt("CUSTOMER_ID");

The best explanation I've heard is something concerning performance. But actually, does it make processing extremely fast? I don't believe so, though I have never performed measurements. Even if retrieving by label would be a bit slower, nevertheless, it provide better readability and flexibility, in my opinion.
So could someone give me good explanation of avoiding to retrieve column values by column index instead of column label? What are pros and cons of both approaches (maybe, concerning certain DBMS)?

A: 

If you have a large number of rows, then you may see a saving. (But benchmark first)

Otherwise, code that is more understandable is preferable.

Mitch Wheat
A: 

The JDBC driver takes care for the column to index look-up. So if you extract values by column name each time the driver makes a look-up (usually in hash map) to check the corresponding index for the column name.

zloster
+10  A: 

Warning: I'm going to get bombastic here, because this drives me crazy.

99%* of the time, it's a ridiculous micro-optimization that people have some vague idea makes things 'better'. This completely ignores the fact that, unless you're in an extremely tight and busy loop over millions of SQL results all the time, which is hopefully rare, you'll never notice it. For everyone who's not doing that, the developer time cost of maintaing, updating, and fixing bugs in the column indexing are far greater than the incremental cost of hardware for your infinitesimally-worse-performing application.

Don't code optimizations like this in. Code for the person maintaining it. Then observe, measure, analyse, and optimize. Observe again, measure again, analyse again, and optimize again.

Optimization is pretty much the last step in development, not the first.

* Figure is made up.

Cowan
+3  A: 

I don't think using the labels impacts performance by much. But there is another reason not to use Strings. Or ints, for that matter.

Consider using constants. Using an int constant makes the code more readably, but also less likely to have errors.

Besides being more readable, the constant also prevents you from making typo's in the label names - the compiler will throw an error if you do. And any IDE worth anything will pick it up. This is not the case if you use Strings or ints.

Sietse
See your point, but I don't think this really helps that much.int COLUMN_FIRST_NAME = 13;int COLUMN_SURNAME = 14;could have a bug; maye FIRST name is 14 and SURNAME is 13. And you still have to adjust when columns are added etc. If you must use constants, I'd use Strings anyway to avoid this.
Cowan
int constants solve readability problem, but previous comment stresses that flexibility problem remains. I also would prefer String constants. But I didn't use constants in examples considering that it would make my point clearer.
Rorick
Rorick, I see what you mean, and I agree. But using constants solves at least the readability problem. The problem of using the correct int for the column you want is the same in both cases.
Sietse
+ for String constants.
marcospereira
use an enum instead
dogbane
A: 

I agree with previous answers that performance is not something that can force us to select either of the approaches. It would be good to consider the following things instead:

  • Code readability: for every developer reading your code labels have much more sense than indexes.
  • Maintenance: think of the SQL query and the way it is maintained. What is more likely to happen in your case after fixing/improving/refactoring SQL query: changing the order of the columns extracted or changing result column names. It seems for me that changing the order of the columns extracted (as the results of adding/deleting new columns in result set) has greater probability to happen.
  • Encapsulation: in spite of the way you choose try to isolate the code where you run SQL query and parse result set in the same component and make only this component aware about the column names and their mapping to the indexes (if you decided to use them).
Vladimir Orlov
A: 

Using the index is an attempt at optimization.

The time saved by this is wasted by the extra effort it takes the developer to look up the necessary data to check if their code will work properly after the changes.

I think it's our built-in instinct to use numbers instead of text.

databyss
+3  A: 

You should use string labels by default.

Pros:

  • Independence of column order
  • Better readability/maintainability

Cons:

  • You have no control over the column names (access via stored procedures)

Which would you prefer?

ints?

int i = 0;
customerId = resultSet.getInt(i++);
customerName = resultSet.getString(i++);
customerAddress = resultSet.getString(i++);

or Strings?

customerId = resultSet.getInt("customer_id");
customerName = resultSet.getString("customer_name");
customerAddress = resultSet.getString("customer_address");

And what if there is a new column inserted at position 1? Which code would you prefer? Or if the order of the columns is changed, which code version would you need to change at all?

That's why you should use string labels by default.

Martin Klinke
A: 

Besides the look up in Map for labels it also leads to an extra String creation. Though it will happens on stack but still it caries a cost with it.

It all depends on the individual choice and till date I have used only indexes :-)

Vinod Singh
A: 

I did some performance profiling on this exact subject on an Oracle database. In our code we have a ResultSet with numerous colums and a huge number of rows. Of the 20 seconds (!) the request takes to execute method oracle.jdbc.driver.ScrollableResultSet.findColumn(String name) takes about 4 seconds.

Obviously there's something wrong with the overall design, but using indexes instead of the column names would probably take this 4 seconds away.

A: 

Sure, using column names increases readibility and makes maintenance easy. But using column names has a flipside. As you know, SQL allows multiple column names with same name, there's no guarantee that the column name you typed in the getter method of resultSet actually points to the column name you intend to access. In theory, using index numbers instead of column names is preffered, but it reduces the readability...

Thanks

NycVicky
+1  A: 

The answer has been accepted, none-the-less, here is some additional information and personal experience that I have not seen put forward yet.

Use column names (constants and not literals is preferred) in general and if possible. This is both clearer, is easier to maintain, and future changes are less likely to break the code.

There is, however, a use for column indexes. In some cases these are faster, but not sufficiently that this should override the above reasons for names*. These are very valuable when developing tools and general methods dealing with ResultSets. Finally, an index may be required because the column does not have a name (such as an unnamed aggregate) or there are duplicate names so there is no easy way to reference both.

*Note that I have written some JDBC drivers and looked inside some open sources one and internally these use column indexes to reference the result columns. In all cases I have worked with, the internal driver first maps a column name to an index. Thus, you can easily see that the column name, in all those cases, would always take longer. This may not be true for all drivers though.

Kevin Brock