views:

49

answers:

3

Good morning,

I am at the moment writing my master thesis and hence I have to justify each an every general assessment I make.

I have a flat database in MySQL which was originally composed of the following fields

  1. Date (DATETIME)
  2. Name (VARCHAR(50))
  3. Value (DOUBLE)

The PK of the table was a composite of the columns 1 and 2.

The thing is that I soon had more than 40 millions rows and my queries for all records over a single name were taking ages.

Hence, I decided to create an "index table" (I think the terminology is correct) where I store a mapping between Names and IDs:

  1. ID (INT)
  2. Name (VARCHAR 50)

And I changed my initial table to

  1. Date (DATETIME)
  2. ID (INT)
  3. Value (DOUBLE)

This way I could first find the ID of the record I was looking for, and then run a query over the large table very quickly (performance was really much better).

I assume this is because comparison between integer is much easier than between character strings, but I was looking for some literature to back this assessment (ideally some database structure book).

Do you think my assessment is correct?

MOST IMPORTANTLY

Does anyone know a good book I could read:

  • about indexing performance for Databases
  • about performance comparison between integers and character strings on equality tests.
+1  A: 

Part of the issue is that compound keys (such as your Date,Name PK) are created by concatenating the indexed values (see http://dev.mysql.com/doc/refman/5.1/en/create-index.html), and the name (the primary thing you're looking up by here) is second. This makes it much more work to look stuff up by name, because the index won't be sorted by name -- it'll be sorted by date, THEN name, meaning mysqld will have to search the whole index instead of just grabbing the section where the PK is between "Jack, 0000-00-00" and "Jack, 9999-12-31".

If you added an index just for the name, or at least switched the PK to (Name, Date), you'd probably find your original table working much better.

Alternatively, if you did the same thing to your Date,ID table, it should be faster still, because you're all but eliminating string comparisons.

cHao
Actually, the new PK is indeed (ID,Date) and not (Date,ID). What I did at the moment is that I have (ID,Date) as PK, and an individual index for both ID and Date because it is as common to query over the name as it is to query over the date.But do you think that, essentially, PK (ID,Date) is generally better than PK(Name,Date)?
JSmaga
@JSmega: If you're into performance, yeah -- it'll be faster. It does slightly complicate data retrieval, though, so i'd check the performance of both and see if the slowdown is worth adding an extra table (and thus, an extra join and/or lookup).
cHao
@cHao: ok thanks. Any idea of a good reference I could give for string comparison complexity in MySQL?
JSmaga
@JSmaga: The only reference i could give is to the MySQL manual. It explains a bit about how it compares strings (namely, that VARCHAR and CHAR columns are compared using the collation specified by the column, table, session, or DB defaults, in that order of preference).
cHao
+1  A: 

Assuming that there is a lot of duplication of data of the "Name" field, your query performance improved because integer comparisons are faster than string comparisons and you significantly reduced the size of the date table. This means less memory paging and less disk seeking.

If the name table has N rows, then you are doing N string comparisons, plus 40 million integer comparisons, instead of 40 million string comparisons. To increase query performance even more, you should add an index for the ID field of the date table.

CREATE INDEX date_id_index ON date_table (ID)
Ok seems I was right then. Any idea about the complexity of string comparison in MySQL?
JSmaga
+1  A: 

As for books, "Applied Mathematics for Database Professionals" by Lex de Haan and Toon Koppelaars is really good book if you want advanced SQL-knowledge. I should point out that you don't just "mention" books, you read them and use them as reference - just referencing books because they sound cool but not reading them will come back to bite you in the ass.

Hagge
@Hagge: Well I think you read the section you're interested in an then you cite as reference. Thanks. I was not planning to do some book fishing...
JSmaga
Sorry, didn't mean to imply you were - I realize my language was a little harsh. I meant it in a more friendly manner :).
Hagge