views:

319

answers:

2

I recently ran into an issue where a query was causing a full table scan, and it came down to a column had a different definition that I thought, it was a VARCHAR not an INT. When queried with "string_column = 17" the query ran, it just couldn't use the index. That really threw me for a loop.

So I went searching and found what happened, the behavior I was seeing is consistent with what MySQL's documentation says:

In all other cases, the arguments are compared as floating-point (real) numbers.

So my question is... why a float?

I could see trying to convert numbers to strings (although the points in the MySQL page linked above are good reasons not to). I could also understand throwing some sort of error, or generating a warning (my preference). Instead it happily runs.

So why convert everything to a float? Is that from the SQL standard, or based on some other reason? Can anyone shed some light on this choice for me?

A: 

Well, it's easily understandable: float is able to hold the greatest range of numbers.

If the underlying datatype is datetime, for instance, it can be simply converted to a float number that has the same intrinsic value.

If the datatype is an string it is easy to parse it to a float, degrading performance not withstanding.

So float datatype is better to fallback.

Paulo Santos
+1  A: 

I feel your pain. We have a column in our DB that holds what is well-known in the company as an "order number". But it's not always a number, in certain circumstances it can have other characters too, so we keep it in a varchar. With SQL Server 2000, this means that selecting on "order_number = 123456" is bad. SQL Server effectively rewrites the predicate as "CAST(order_number, INT) = 123456" which has two undesirable effects:

  1. the index is on order_number as a varchar, so it starts a full scan
  2. those non-numeric order numbers eventually cause a conversion error to be thrown to the user, with a rather unhelpful message.

In a way it's good that we do have those non-numeric "numbers", since at least badly-written queries that pass the parameter as a number get trapped rather than just sucking up resources.

I don't think there is a standard. I seem to remember PostgreSQL 8.3 dropped some of the default casts between number and text types so that this kind of situation would throw an error when the query was being planned.

Presumably "float" is considered to be the widest-ranging numeric type and therefore the one that all numbers can be silently promoted to?

Oh, and similar problems (but no conversion errors) for when you have varchar columns and a Java application that passes all string literals as nvarchar... suddenly your varchar indices are no longer used, good luck finding the occurrences of that happening. Of course you can tell the Java app to send strings as varchar, but now we're stuck with only using characters in windows-1252 because that's what the DB was created as 5-6 years ago when it was just a "stopgap solution", ah-ha.

araqnid