tags:

views:

557

answers:

5

I just ran a "PROCEDURE ANALYSE ( )" on one of my tables. And I have this column that is of type INT and it only ever contains values from 0 to 12 (category IDs). And MySQL said that I would be better of with a ENUM('0','1','2',...,'12'). This category's are basically static and won't change in the future, but if they do I can just alter that column and add it to the ENUM list...

So why is ENUM better in this case?

edit: I'm mostly interested in the performance aspect of this...

+2  A: 

Because it introduces a constraint on the possible values.

Sklivvz
+2  A: 

I'm not a MySQL expert, but my guess is that integers always take up four bytes of space where enums take up varying amounts of space based upon the range of data needed. Since you only need 13 items, it could get away with using 1 byte for your column.

Jason Citron
+5  A: 

Put simply, it's because it's indexed in a different way.

In this case, enum says "It's one of these 13 values" whereas INT is saying "It could be any integer"

This means that indexing is easier, as it doesn't have to take into account indexing for those integers you don't use, "just in case" you ever use them.

It's all to do with the alogorithms.

I'd be interested myself though when it gets to a point where the INT would be quicker than the ENUM

Using numbers in an enum might be a little dangerous though... as if you send this number unquoted to SQL - you might end up getting the wrong value back!

Mez
+1  A: 

On Oracle I would have a BITMAP index which is much faster than a hash-based lookup for such a small number of values. (So I presume a similar benefit in query optomisation or indexing is available for MySQL.)

Interestingly The MySQL docs suggest that using 'things that look like numbers' are a bad choice for the ENUM type because of potential confusion between the enum value and the enum index (http://dev.mysql.com/doc/refman/5.0/en/enum.html).

mmaibaum
+5  A: 

Yikes! There's a bunch of ambiguities with using numbers in an ENUM field. Be careful. The one gotcha I remember is that you can access values in ENUMS by index: if your enum is ENUM('A', 'B', 'C', '1', '2, '3'), then these two queries are very different:

INSERT INTO TABLE (example_col) VALUES( '1' ); -- example_col == 1
INSERT INTO TABLE (example_col) VALUES(  1  ); -- example_col == A

I'm assuming the recommendation is because it limits the valid values that can get into the table. For instance, inserting 13 should get the default choice.

A better choice would by to use TINYINT instead of INT. an UNSIGNED TINYINT has a range of 0 to 255 and only takes 1 byte to store. An INT takes 4 bytes to store. If you want to constrain the values getting into the table, you can add ON INSERT and ON UPDATE triggers that check the values.

If you're worried about the performance difference between ENUM and TINYINT, you can always benchmark to see the different. This article seems somewhat relevant.

Gary Richardson