views:

302

answers:

2

SQL databases seem to be the cornerstone of most software. However, it seems optimized for textual data. In fact when doing any queries involving numerical data, integers specifically, it seems inefficient that the numbers are getting converted to text and then back to native formats both ways between the application and the database. This same inefficiency seems to apply to BLOB data as well. My understanding is that even with something like Linq to SQL, this two way conversion is occuring in the background.

Are there general ways to bypass this overhead with SQL? Are there certain database management systems that handle this more efficiently than others (ie, with non-standard extensions/API's)?

Clarification. In the following select statement, the list of numbers after IN could be more easily passed as a raw array of int, but there seems to be no way of achieving that optimization level.

SELECT foo FROM bar WHERE baz IN (23, 34, 45, 9854004, ...)
+1  A: 

Numerical data in a database is not stored as text. I guess it depends on the database, but it certainly doesn't have to be and isn't.

BLOBs are stored exactly how you set them -- by definition, the DB has no way to interpret the information -- I guess it could compress if it found that to be useful. BLOBs are not translated into text.

Here's how Oracle stores numbers:

http://download.oracle.com/docs/cd/B28359_01/server.111/b28318/datatype.htm#i16209

Internal Numeric Format

Oracle Database stores numeric data in variable-length format. Each value is stored in scientific notation, with 1 byte used to store the exponent and up to 20 bytes to store the mantissa. The resulting value is limited to 38 digits of precision. Oracle Database does not store leading and trailing zeros. For example, the number 412 is stored in a format similar to 4.12 x 102, with 1 byte used to store the exponent(2) and 2 bytes used to store the three significant digits of the mantissa(4,1,2). Negative numbers include the sign in their length.

MySQL info here:

http://dev.mysql.com/doc/refman/5.0/en/numeric-types.html

Look at the table -- a TINYINT is represented in 1 byte (range -128 - 127), not possible if stored as text.

EDIT: With the clarification -- I would say use the API in your language that looks something like this (pseudocode)

stmt = conn.Prepare("SELECT * FROM TABLE where x in (?, ?, ?)");
stmt.SetInt(0, x);
stmt.SetInt(1, y);
stmt.SetInt(2, z);

I don't believe that the underlying protocols use text for the transport of parameters.

Lou Franco
The question is more about the communication between the application and the DBMS, not how the data are stored.
postfuturist
+2  A: 

Don't suppose. Measure.

Format conversion is not likely to be a measurable cost for database work, unless you are misusing the database as an arithmetic engine.

The IO cost for LOBs, especially for CLOBS with character conversion, can become significant; the remedy here, once you know that the simplest thing that might work actually has a noticeable performance impact, is to minimize the number of times you copy the LOB data. Use whatever SQL parameter binding style allows you to transfer the data directly between its point of creation or use, and the database -- often this is binding the LOB to a stream or I/O channel.

But don't do this until you have a way to measure the impact, and have measurements showing that this is your bottleneck.

Roger Hayes