views:

3696

answers:

17

I realize that CHAR is recommended if all my values are fixed-width. But, so what? Why not just pick VARCHAR for all text fields just to be safe.

+9  A: 

Char is a little bit faster, so if you have a column that you KNOW will be a certain length, use char. For example, storing (M)ale/(F)emale/(U)nknown for gender, or 2 characters for a US state.

Jarrett Meyer
Not sure that's a GREAT answer, since an ENUM would usually make a lot more sense, although I'm not sure how widely supported that type is (outside of MySQL).
Bobby Jack
Seems to me that the set of states is not necessarily immutable, so char(2) seems much more appropriate than an enum.
Kearns
@Bobby Jack - I don't know the specific details of any particular SQL enum implementation, but keep in mind that an enum stored as a 4 byte integer might require more space than a char(1) or char(2) column with the same data. There is a sense in which enums are more logical in terms of their interpretation, and that might be compelling, but everything in an RDBMS system is abstract at some level and subject to the predicates defined for the tables.
Jeffrey L Whitledge
Bad example, ENUM is best for that case. Better example would be a 3 letter IATA airport code
Andrew G. Johnson
@Andrew, not all db's support ENUM data types. MSSQLServer, for example, does not. Also, an ENUM, stored as an int, takes 4 bytes. CHAR(1) takes 1 byte, and NCHAR(1) takes 2 bytes.
Jarrett Meyer
@Jarrett - I think ENUM is stored as TINYINT normally
Andrew G. Johnson
A: 

There is some small processing overhead in calculating the actual needed size for a column value and allocating the space for a Varchar, so if you are definitely sure how long the value will always be, it is better to use Char and avoid the hit.

Guy Starbuck
A: 

That doesn't make much sense. What you're saying is basically 'why optimize my code when it works? there may be performance benefits, but who cares about that?'

Nick Stinemates
But, HOW does it optimize my code to pick CHAR over VARCHAR?
A: 

when using varchar values SQL Server needs an additional 2 bytes per row to store some info about that column whereas if you use char it doesn't need that so unless you

SQLMenace
+62  A: 

Generally pick CHAR if all rows will have the same length. VARCHAR when it is variable length. CHAR is also a bit faster. There may be a break even point somewhere in between where most of them will be of one length, and some are just a little shorter were it still might be better to go with CHAR, but YMMV for that.

It varies by DB implementation, but generally VARCHAR uses one or two more bytes of storage (for length or termination) then CHAR. So (assuming you are using ANSI character set) to store the word "FooBar" in a CHAR(6) is 6 bytes. To store the same word in VARCHAR(10) would be 7 bytes. But to store "FooBar" in a CHAR(10) would be 10 bytes.

Bottom line is CHAR can be faster and more space efficient for fixed with data than VARCHAR.

Jim McKeeth
A: 

I stand by Jim McKeeth's comment.

Also, indexing and full table scans are faster if your table has only CHAR columns. Basically the optimizer will be able to predict how big each record is if it only has CHAR columns, while it needs to check the size value of every VARCHAR column.

Besides if you update a VARCHAR column to a size larger than its previous content you may force the database to rebuild its indexes (because you forced the database to physically move the record on disk). While with CHAR columns that'll never happen.

But you probably won't care about the performance hit unless your table is huge.

Remember Djikstra's wise words. Early performance optimization is the root of all evil.

Alvaro Rodriguez
There is a degree of speculation in your comment. I have seen time and time again assumptions like these get tested and the exact opposite turn out to be true. The problem is many engineers will take info like this as the gospel. Please folks, create test cases which reflect your real situations.
Ethan Post
Ethan is totally correct. This so depends on the implementation you're using that without references to actual (Product,Version) it's completely useless.
David Schmitt
A: 

In some SQL databases, VARCHAR will be padded out to its maximum size in order to optimize the offsets, This is to speed up full table scans and indexes.

Because of this, you do not have any space savings by using a VARCHAR(200) compared to a CHAR(200)

FlySwat
Which databases implement VARCHAR that way?
Troels Arvin
Seriously, what database implements it that way? What you describe normally applies to CHAR, not VARCHAR.
BipedalShark
mysql will convert varchar's to chars if there are char's and varchar's in the same table.
Malfist
+6  A: 

There is a difference between early performance optimization and using a best practice type of rule. If you are creating new tables where you will always have a fixed length field, it makes sense to use CHAR, you should be using it in that case. This isn't early optimization, but rather implementing a rule of thumb (or best practice).

i.e. - If you have a 2 letter state field, use CHAR(2). If you have a field with the actual state names, use VARCHAR.

Redbeard 0x0A
+8  A: 

If your working with me and your working with Oracle I would probably make you use varchar in almost every circumstance. The assumption that char uses less processing power than varchar may be true...for now...but database engines get better over time and this sort of general rule has the making of a future "myth".

Another thing, I have never seen a performance problem because someone decided to go with varchar. You will make much better use of your time writing good code (fewer calls to the database) and efficient SQL (how do indexes work, how does the optimizer make decisions, why is "exists" faster than "in" usually...

Final thought, I have seen all sorts of problems with use of CHAR, people looking for '' when they should be looking for ' ', or people looking for 'FOO' when they should be looking for 'FOO (bunch of spaces here)', or people not trimming the trailing blanks, or bugs with Powerbuilder adding up to 2000 blanks to the value it returns from an Oracle procedure.

Ethan Post
Sorry, just a pet peeve:It should start out with "If you're working with me and you're working with Oracle..."
thursdaysgeek
I disagree somewhat with your first paragraph, since char may provide a hint that could be useful to optimizers, even future ones, and it may help to communicate the intent of the column. But +1 for your third paragraph. I hate all the extra spaces. A field should just store whatever I put in it without all the [explicative] padding. Basically, I just use char if all of the data is to be exactly the same length, no more and no less, now and forever. This is very rare, of course, and is usually a char(1).
Jeffrey L Whitledge
+4  A: 

In addition to performance benefits, CHAR can be used to indicate that all values should be the same length, e.g., a column for U.S. state abbreviations.

Hank Gay
+2  A: 

I would choose varchar unless the column stores fixed value like US state code -- which is always 2 chars long and the list of valid US states code doesn't change often :).

In every other case, even like storing hashed password (which is fixed length), I would choose varchar.

Why -- char type column is always fulfilled with spaces, which makes for column my_column defined as char(5) with value 'ABC' inside comparation:

my_column = 'ABC' -- my_column stores 'ABC  ' value which is different then 'ABC'

false.

This feature could lead to many irritating bugs during development and makes testing harder.

Grzegorz Gierlik
At least in MSSQL Server, 'abc ' = 'abc'. I've never quite figured out if I like or detest that feature....
Mark Brackett
A: 

There are performance benefits, but here is one that has not been mentioned: row migration. With char, you reserve the entire space in advance.So let's says you have a varchar2(1000), and you store 10 characters, you will use up all 1000 charaters of space. In a varchar2, you will only use 10 characters. The problem comes when you modify the data. Let's say you update the column to now contain 900 characters. It is possible that the space to expand the varchar is not available in the current block. In that case, the DB engine must migrate the row to another block, and make a pointer in the original block to the new row in the new block. To read this data, the DB engine will now have to read 2 blocks.
No one can equivocally say that varchar or char are better. There is a space for time tradeoff, and consideration of whether the data will be updated, especially if there is a good chance that it will grow.

Tony BenBrahim
I think you have a typo in your post -- shouldn't varchar2(1000) be CHAR(1000) ?
Matt Rogish
A: 

It's the classic space versus performance tradeoff.

In MS SQL 2005, Varchar (or NVarchar for lanuagues requiring two bytes per character ie Chinese) are variable length. If you add to the row after it has been written to the hard disk it will locate the data in a non-contigious location to the original row and lead to fragmentation of your data files. This will affect performance.

So, if space is not an issue then Char are better for performance but if you want to keep the database size down then varchars are better.

Leo Moore
A: 

I think in your case there is probably no reason to not pick Varchar. It gives you flexibility and as has been mentioned by a number of respondants, performance is such now that except in very specific circumstances us meer mortals (as opposed to Google DBA's) will not notice the difference.

An interesting thing worth noting when it comes to DB Types is the sqlite (a popular mini database with pretty impressive performance) puts everything into the database as a string and types on the fly.

I always use VarChar and usually make it much bigger than I might strickly need. Eg. 50 for Firstname, as you say why not just to be safe.

Toby Allen
A: 

Many people have pointed out that if you know the exact length of the value using CHAR has some benefits. But while storing US states as CHAR(2) is great today, when you get the message from sales that 'We have just made our first sale to Australia', you are in a world of pain. I always send to overestimate how long I think fields will need to be rather than making an 'exact' guess to cover for future events. VARCHAR will give me more flexibility in this area.

Craig
+1  A: 

CHAR takes up less storage space than VARCHAR if all your data values in that field are the same length. Now perhaps in 2009 a 800GB database is the same for all intents and purposes as a 810GB if you converted the VARCHARs to CHARs, but for short strings (1 or 2 characters), CHAR is still a industry "best practice" I would say.

Now if you look at the wide variety of data types most databases provide even for integers alone (bit, tiny, int, bigint), there ARE reasons to choose one over the other. Simply choosing bigint every time is actually being a bit ignorant of the purposes and uses of the field. If a field simply represents a persons age in years, a bigint is overkill. Now it's not necessarily "wrong", but it's not efficient.

But its an interesting argument, and as databases improve over time, it could be argued CHAR vs VARCHAR does get less relevant.

Scott
+1  A: 

Data to store: "ABCD"


Char(4) takes 4b

varchar(4) takes 6b

or

Char(40) takes 40b

varchar(40) takes 6b

or

Char(400) takes 400b

varchar(400) takes 402b

Sunil Kumar Lalwani
hope this helps you to under stand when to use what
Sunil Kumar Lalwani
Mistake: "ABCD" stored in a VARCHAR(400) will also only take 6 bytes.
Guy