Char(4) versus int as StatusID/StatusCode column in a table

views:

185

answers:

+1 Q:

Char(4) versus int as StatusID/StatusCode column in a table

I need a status column that will have about a dozen possible values. Is there any reason why I should choose int (StatusID) over char(4) (StatusCode)? Since sql server doesn't support named constants, char is far more descriptive than int when used in stored procedure and views as constants. To clarify, I would still use a lookup table either way. Since the I will need a more descriptive text for the UI. So this decision is only to help me as the developer when I'm maintaining the stored procedures and views.

Right now I'm leaning toward char(4). Especially since designing views in SQL Server Management Studio prevents me from adding comments (I know it's possible to add it in the script editor, but realistically I will use the View Designer far more often, especially if the view is trivial). StateCODE = 'NEW' is much more readable than StateID = 1000. I guess the question is will there be cases where char(4) is problematic, and since the database is pretty small, I'm not too concerned about slight performance hit (like using TinyInt versus int), but more afraid of code maintenance problems.

+1 A:

Database purists will say a key should have no meaning in the business domain, and that you should create a status table where you look up the description and other meanings of the status.

But for operators and end users, having a descriptive status code can be a blessing. And it doesn't even have to be char(4), you can make it varchar(20). This allows them to query without joins, and inspect the database in an easier way.

In the end, I think the char(20) organization will run more smoothly, and go home earlier on Friday. But the int organization has a better abstraction of the database, and they can enjoy meta programming on friday evening (or boosting on forums.)

(All of this assuming that you're writing business support software. One of the more succesful business support systems, SAP, makes successful use of meaningful keys.)

Andomar 2009-05-13 12:07:50

I'm not sure which database purists you talk to, but a key absolutely can have meaning for the business. Some people like to use surrogate keys for various reasons, in some cases natural keys make more sense. A natural key will certainly have meaning in the business world.

Tom H. 2009-05-13 14:14:14

Purist = people who look down on "the business". Example: the finance people come to you in a panic with a 17 sheet Excel workbook that they need fixed RIGHT NOW. The purist would say "dont use Excel". He's right, and wrong, if you grab my meaning :)

Andomar 2009-05-13 21:03:41

I think I like varchar even better than char, but for some reason I have this idea char would be faster. Is there any performance difference between the two?

Olonarp 2009-05-14 05:43:46

Char can be slightly faster because it allows the database to predict the exact location of the next field, for example 4 bytes later. Unless you're using it as one of the first fields in a very large table the difference is negligible.

Andomar 2009-05-14 07:36:56

You could also use a tinyint over an int

Tom Ferguson 2009-05-13 12:11:26

+1 A:

The best thing should be a lookup table with defined values and then relate it to original table, that uses that enumeration.

Robert Koritnik 2009-05-13 12:11:26

Sure, but I think the question is, whether the lookup table will have a key like 1-2-3 or a key like 'NEW', 'OPEN', 'DONE'.

Andomar 2009-05-13 14:05:43

In this case it depends on amount of data in *enum* table. You have to consider the fact that comparing numbers is significantly faster than comparing two string character by character.

Robert Koritnik 2009-05-14 07:49:53

+2 A:

There are many pro's and con's to each method. I'm sure other arguments will come up in favour of using a char(4). My reasons for choosing an int over a char include:

I always use lookup tables. They allow for an audit trail of the value to be retained and easily examined. For example, if one of your status codes is 'MING' and a business decision is made to change it from 'MING' to 'MONG' from a certain date, my lookup table handles this.
Smaller index - if you need to index this column, it will be thinner.
Extendability - OK, I made that word up, but if you need to go from 4 chars to 5 chars for example, a lookup table would be a blessing.
Descriptions: We use a lot of TLA's here which once you know what they are is great but if I gave a business user a report that said "GDA's 2007 1001", they wouldn't necessarily twig that GDA = Good Dead on Arrival. With a lookup table, I can add this description.
Best practice: Can't find the link to hand but it might be something I read in a K.Tripp article. Aim to make your clustered primary key incrementing integers to optimise the index.

Of course if you are absolutely positive that you will never need any more than a handful of 4 characters, there is no reason not to bang it in the table.

ekoner 2009-05-13 12:12:12

This answer seems a bit vague: [1] you can add a lookup table for a char(4) status too, and updating a char(4) status is not rocket science [2] there's no smaller index than a char(4) [3] you can alter a char(4) to char(5) [4] see 1 [5] optimizing the index, what does that even mean?

Andomar 2009-05-13 14:04:04

+1 A:

Collation ambigities are one reason to say no to char 4: Does ABcD = abCD = äBCd?

If you have 12 possible values, why not tinyint/byte and a Status table? If you have to store the status for 10 million rows the 3 bytes different and the collation/string compares add up.

gbn 2009-05-13 12:12:44

I will use a lookup table even with char(4). The StatusCode will be in all uppercase. I'm not too concerned about slight performance difference, since the database itself is relatively small.

Olonarp 2009-05-14 05:45:02

+1 A:

The place where I've run into this use case is columns that would map onto things that I would typically use an Enum for when programming. Do you store the integer value of the Enum or the name of the Enum in the database column? Honestly, I've done it both ways. Usually, I ask myself if the database will be used outside the application I'm building. If so, I will choose the human readable format to store in the database. If not, then I'll choose the integer value as it saves a little time when reconstituting (it's just a cast instead of a parse operation) the Enum in code.

tvanfosson 2009-05-13 12:14:59

i always choose int's simply because they are easier to map to enums in code.

Mladen Prajdic 2009-05-13 13:48:45

That would depend on your language. In C#, it's as simple as Enum.Parse.

Andomar 2009-05-13 14:09:26

If you're dealing with huge amounts of data and high throughput then a smallint or tinyint can give better performance and a smaller footprint on the hard disk. If the data in your application is often viewed directly through applications like Access or Cognos then your business people will probably appreciate the descriptive values. I know that when I'm analyzing data as part of my Database Developer role I get tired of joining a lot of lookup tables because I can't remember if 1 = Foo and 2 = Bar or 1 = Bar and 2 = Foo.

Also, although performance will be enhanced if you have to lookup rows by these codes which can have smaller indexes, it can also be hurt (in a minor way) by having to do the joins if you are often looking up rows regardless of the code but where you have to include the text value. In most applications that's not an issue though and would probably only come into play in large data warehousing/reporting environments.

Tom H. 2009-05-13 14:20:06

ansaurus

tags:

views:

answers:

Char(4) versus int as StatusID/StatusCode column in a table

related questions