views:

1461

answers:

7

Hello!

In MYSQL you can choose a length for the VARCHAR field type. Possible values are 1-255.

But what are its advantages if you use VARCHAR(255) that is the maximum instead of VARCHAR(20)? As far as I know, the size of the entries depends only on the real length of the inserted string.

size (bytes) = length+1

So if you have the word "Example" in a VARCHAR(255) field, it would have 8 bytes. If you have it in a VARCHAR(20) field, it would have 8 bytes, too. What is the difference?

I hope you can help me. Thanks in advance!

+1  A: 

Check out: Reference for Varchar

In short there isn't much difference unless you go over the size of 255 in your VARCHAR which will require another byte for the length prefix.

The length indicates more of a constraint on the data stored in the column than anything else. This inherently constrains the MAXIMUM storage size for the column as well. IMHO, the length should make sense with respect to the data. If your storing a Social Security # it makes no sense to set the length to 128 even though it doesn't cost you anything in storage if all you actually store is an SSN.

RC
+1  A: 

Well, if you want to allow for a larger entry, or limit the entry size perhaps.

For example, you may have first_name as a VARCHAR 20, but perhaps street_address as a VARCHAR 50 since 20 may not be enough space. At the same time, you may want to control how large that value can get.

In other words, you have set a ceiling of how large a particular value can be, in theory to prevent the table (and potentially the index/index entries) from getting too large.

You could just use CHAR which is a fixed width as well, but unlike VARCHAR which can be smaller, CHAR pads the values (although this makes for quicker SQL access.

OneNerd
But why should I set the length to 20 at all? If there's no difference, I can simply set it to 255 for all fields, can't I?
see my edit - i elaborate a bit more.
OneNerd
Thank you. So it doesn't make any differences converning storage and performance, right? But it can be useful, though, e.g. if you want to cut strings to prevent to long ones. But you could also achieve this before if you used substr() in PHP or a similar function in another language!?
The difference has to do with data integrity. Real world example: storing computer names of Windows machines in your database. A computer running windows cannot have a computer name longer than 63 bytes. If you define a field that contains computers names as varchar(255) you can input names that are invalid and may cause errors if you try to use these names to access computers. Defining varchar(63) makes MySQL reject (or truncate) inserts or updates that exceed 63 characters (http://technet.microsoft.com/en-us/library/cc757496(WS.10).aspx for more on computer names in Windows 2003)
shufler
You can't always ensure other programmers will know not to enter more than 63 characters. By setting this on the field, you proactively prevent this from ever being an issue in the future.
shufler
+1  A: 

From a database perspective performance wise I do not believe there is going to be a difference.

However, I think a lot of the decision on the length to use comes down to what you are trying to accomplish and documenting the system to accept just the data that it needs.

Mitchel Sellers
+2  A: 

An identical question I just found. With a great answer great answers!.

sixfoottallrabbit
Thank you. This question is exactly about my problem. But the best answer (selected) seems to be wrong (see comment). According to the second answer, it does only matter if your query needs a temporary table (GROUP BY etc).
The accepted answer is selected by the questioner - if you disagree with it, leave a comment on it (the questioner will be notified when they next login) and optionally vote down the incorrect one and up a correct one.
bdonlan
+5  A: 

There are many valid reasons for choosing a value smaller than the maximum that are not related to performance. Setting a size helps indicate the type of data you are storing and also can also act as a last-gasp form of validation.

For instance, if you are storing a UK postcode then you only need 8 characters. Setting this limit helps make clear the type of data you are storing. If you chose 255 characters it would just confuse matters.

Dan Diplo
+1 for emphasizing clarity and intelligibility. I would certainly question the use of a buf[1024], a `new DynamicBuf(1024)` or a VARCHAR(255) to store, for example, a single, dotted-quad IP address. Did the coder know what he was doing?
pilcrow
+1 And if you decide on VARCHAR(8) for UK postcodes use CHAR(8) for performance :)
Al
A: 

There is a semantical difference (and I believe that's the only difference): if you try to fill 30 non-space characters into varchar(20), it will produce an error, whereas it will succeed for varchar(255). So it is primarily an additional constraint.

Martin v. Löwis
But isn't it clear that 30 characters don't fit into a 20-byte field?
As you say, the storage representation really doesn't care about the length: a field declared varchar(20) could just fine store 30 characters, as far as disk representation goes - I thought this observation was the core of your question (why not always use varchar(255)?) Now, I'm telling you the reason why you set varchar(20): because you *want* the error that you get if you accidentally try to put in more than 20 characters.
Martin v. Löwis
So it's only for validation issues, correct?
Correct, AFAICT.
Martin v. Löwis
+1  A: 

I don't know about mySQL but in SQL Server it will let you define fields such that the total number of bytes used is greater than the total number of bytes that can actually be stored in a record. This is a bad thing. Sooner or later you will get a row where the limit is reached and you cannot insert the data.

It is far better to design your database structure to consider row size limits.

Additionally yes, you do not want people to put 200 characters in a field where the maximum value should be 10. If they do, it is almost always bad data.

You say, well I can limit that at the application level. But data does not get into the database just from one application. Sometimes multiple applications use it, sometimes data is imported and sometimes it is fixed manually from the query window (update all the records to add 10% to the price for instance). If any of these other sources of data don't know about the rules you put in your application, you will have bad, useless data in your database. Data integrity must be enforced at the database level (which doesn't stop you from also checking before you try to enter data) or you have no integrity. Plus it has been my experience that people who are too lazy to design their database are often also too lazy to actually put the limits into the application and there is no data integrity check at all.

They have a word for databases with no data integrity - useless.

HLGEM