views:

118

answers:

6

What are some good rules of thumb for specifying a column length? People will arbitrarily use "VARCHAR(50)" for name fields etc... but most of this is guess work. Does anyone have any resources they use? What are the rules of thumb you follow to designate a character length? For example, for a URL - these can be lengthy especially given a maximum length querystring - but using VARCHAR(MAX) seems over the top even though making it VARCHAR will only take the space required of the actual string data.

Please advise/share -

A: 

In mysql at least, the overhead from a VARCHAR field is either one byte, for maximum widths up to 255 bytes, or two bytes, for any larger maximum width. So one good rule of thumb would be to use VARCHAR(255) when you don't expect your data to exceed 255 bytes, or VARCHAR(65535) elsewhere.

Of course, if you expect there to be a true hard limit to the data, you may want to set a limit in the column definition to act as a second level of data validation (but if it gets far enough to cause a database error, it should be considered a bug at the higher levels).

Source: http://dev.mysql.com/doc/refman/6.0/en/char.html

bdonlan
A: 

For a URL I'd consult the RFC for URLs. First and last names I generally go with 100 characters - it doesn't really matter as you said since (N)VARCHAR only takes up the space it needs.

Things I tend to do:

  • I'd generally go with NVARCHAR instead of VARCHAR for names as a lot of names need special characters.
  • I make sure that whatever length I choose in the DB, I also enforce that in the UI.
rein
A: 

Starting out cautiously, yetconservative, can allow you to later expand, starting out with huge lengths can introduce unknowns for performance and usage--and could potentially make it difficult to trim down later.

Will you have lots and lots and lots and lots of columns that you are unsure about? Will changing them later be a monstrous task? If not then play it conservative, and if you hit the limit then you can expand--just keep in mind that you don't want these constraints spread all over your code; centralize them as much as possible and enforce them where required.

Try and choose a length that is longer than you expect, but not totally out of the question. If it's a field that you think 25 is reasonable for then make it 50; if 150 seems about right then make it 200.

Typically we use nvarchar(100) for names, 255 for URLs or other fields that can get long (these are usually just domain names, not full URLs to specific pages/resources of a site). Our mainframe imposes similar limits and every year we have a few field expansions--but we expected that and it's not a big deal to modify them.

STW
+3  A: 

RFCs for things like URLs and e-mail addresses can be useful. URLs have a 2083 character limit, while e-mail addresses have a maximum length of 320 characters (64 for the name, 1 for @, and 255 for the domain).

In general, I err on the side of 'too long' for anything storing user data. For internal use, I try to be more restrictive. I usually use 50 for keys, 255 for descriptions, and so forth. You won't see much impact on performance either way unless you're doing searches or something, so a lot of these choices come down to your personal preference.

In short:

  1. A specification exists for this data (URL, e-mail address, etc): be precise
  2. User data: go long for things like names, set a reasonable limit for things like comment text
  3. Internal use: whatever floats your boat
Steven Richards
A: 

What are you trying to optimize? A regular VARCHAR doesn't have much difference in cost, as pointed out, so it's not worth a lot of effort for that. The other reason you might care is for some kind of validation rule, in which case you need to figure out what determines whether or not a field value is valid or not. Then apply that.

Charlie Martin
+1  A: 

Well usually I consider the expected lenth of the longest data entry I can think of for the field and then usually add 10-20% to that as a starting point unless I know the data has a specified limit (zip codes have a specfied length for instance) and the I use that. If the length in a hard limit (it must be that length and no other, I use a Char datatype, otherwise i use varchar or nvarchar for string data.

HLGEM