views:

68

answers:

4

I have a column in a table that will store an enum value. E.g. Large, Medium, Small or the days of the week. This will correspond to displayed text on a web page or user selection from a droplist. What is the best design?

Store the values as an int and then perhaps have a table that has the enums/int corresponding string in it.

Just store the values in the column as a string, to make queries a little more self-explanatory.

At what point/quantity of values is it best to use ints or strings.

Thanks.

A: 

Going with your first example. Lets say you create a Look up table: Sizes. It has the following columns: Id - primary key + identity Name - varchar / nvarchar

You'd have three rows in the table, Small, Medium and Large with values 1, 2, 3 if you inserted them in that order.

If you have another table that uses those values you can use the identity value as the foreign key...or you could create a third column which is a short hand value for the three values. It would have the values S, M & L. You could use that as the foreign key instead. You'd have to create a unique constraint on the column.

As far as the dropdown, you could use either one as the value behind the scenes.

You could also create S/M/L value as the primary key as well.

For your other question about when its best to use the ints vs strings. There is probably a lot of debate on the subject. A lot of people only like using identity values as their primary keys. Other people say that it's better to use a natural key. If you are not using an identity as the primary key then it's just important to make sure you have a good candidate for the primary key (making sure it will always be unique and that the value does not change).

Dismissile
+1  A: 

Assuming your RDBMS of choice doesn't have an ENUM type (which handles this for you), I think best to use ids instead of strings directly when the values can change (either in value or in quantity.)

You might think that days of the week won't change, but what if your application needs to add internationalization support? (or an evil multinational corporation decides to rename them after taking control of the world?)

Also, that Large, Medium and Small categorization is probably changing after a while. Most values you think cannot change, can change after a while.

So, mainly for anticipating change reasons, I think it's best to use ids, you just need to change the translation table and everything works painlessly. For i18n, you can just expand the translation table and pull the proper records automatically.

Most likely (it'll depend on various factors) ints are going to perform better, at the very least in the amount of required storage. But I wouldn't do ints for performance reasons, I'd do ints for flexibility reasons.

Vinko Vrsalovic
Not come across that before, which RDBMS have ENUM support? Something I've missed for definite
Paul Hadfield
MySQL has it: http://dev.mysql.com/doc/refman/5.0/en/enum.html
Vinko Vrsalovic
Fantastic, I'm just starting to take a look at mySQL so I'll keep an eye out for it.
Paul Hadfield
A: 

Hi, this is an interesting question. Definitely you have to take in consideration performance targets here. If you wan't to go for speed, int is a must. A Database can index integers a bit better than Strings although I must say its not at all a bad performance loss.

On example is Oracle database itself where they have the luxury of doing large caps enum as strings on their system tables. Things like USER_ALLOCATION_TYPE or things like that are the norm. Its like you say, Strings can be more "extensible" and more readable, but in any case in the code you will end up with:

Static final String USER_ALLOCATION_TYPE="USER_ALLOCATION_TYPE";

in place of

Static final int USER_ALLOCATION_TYPE=5;

Because you either do this you will end up with all this string literals that are just aching for someone to go there and misplace a char! :)

In my company we use tables with integers primary keys; all the tables have a serial primary key, because even if you don't think you need one, sooner or later you'll regret that.

In the case you are describing what we do is that we have a table with (PK Int, Description String) and then we do Views over the master tables with joins to get the descriptions, that way we get to see the joined fields descriptions if we must and we keep the performance up.

Also, with a separate description table you can have EXTRA information about those ids you would never think about. For example, lets say a user can have access to some fields in the combo box if and only if they have such property and so. You could use extra fields in the description table to store that in place of ad-hoc code.

My two cents.

Rui
A: 

I too would be interested in people's thinking regarding this, I've always gone the route of storing the enum in a look up table and then in any data tables that referenced the enum I would store the ID and using FK relationship. In a certain way, I still like this approach, but there is something plain and simple about putting the string value directly in the table.

Going purely by size, an int is 4 bytes, where as the string is n btyes (where n is number of characters). Shortest value in your look up is 5 characters, longest is 6, so storing the actual value would use up more space eventually (if that was a problem).

Going by performance, I'm not sure if an index on an int or on a varchar would return any difference in speed / optimisation / index size?

Paul Hadfield
Plain, simple and painful: Doing updates if values change and your data grows to a respectable amount will be a pain. Maintaining consistency across those changes could also become a pain. Extending the enum to add a new value (if you are enforcing it) could become a pain. And there are probably more painful stuff I'm not even thinking about.
Vinko Vrsalovic