In SQL Server 2005, are there any disadvantages to making all character fields nvarchar(MAX) rather than specifying a length explicitly, e.g. nvarchar(255)? (Apart from the obvious one that you aren't able to limit the field length at the database level)
The only problem I found was that we develop our applications on SQL Server 2005, and in one instance, we have to support SQL Server 2000. I just learned, the hard way that SQL Server 2000 doesn't like the MAX option for varchar or nvarchar.
Its a fair question and he did state apart from the obvious...
Disadvantages could include:
Performance implications Query optimizer uses field size to determine most efficent exectution plan
"1. The space alloction in extends and pages of the database are flexible. Thus when adding information to the field using update, your database would have to create a pointer if the new data is longer than the previous inserted. This the database files would become fragmented = lower performance in almost everything, from index to delete, update and inserts. " http://www.sqljunkies.com/WebLog/simons/archive/2006/02/28/Why_use_anything_but_varchar_max.aspx
Integration implications - hard for other systems to know how to integrate with your database Unpredictable growth of data Possible security issues e.g. you could crash a system by taking up all disk space
There is good article here: http://searchsqlserver.techtarget.com/tip/1,289483,sid87_gci1098157,00.html
Same question was asked on MSDN Forums:
From the original post (much more information there):
When you store data to a VARCHAR(N) column, the values are physically stored in the same way. But when you store it to a VARCHAR(MAX) column, behind the screen the data is handled as a TEXT value. So there is some additional processing needed when dealing with a VARCHAR(MAX) value. (only if the size exceeds 8000)
VARCHAR(MAX) or NVARCHAR(MAX) is considered as a 'large value type'. Large value types are usually stored 'out of row'. It means that the data row will have a pointer to another location where the 'large value' is stored...
I had a udf which padded strings and put the output to varchar(max). If this was used directly instead of casting back to the appropriate size for the column being adjusted, the performance was very poor. I ended up putting the udf to an arbitrary length with a big note instead of relying on all the callers of the udf to re-cast the string to a smaller size.
This will cause a performance problem, although it may never cause any actual issues if your database is small. Each record will take up more space on the hard drive and the database will need to read more sectors of the disk if you're searching through a lot of records at once. For example, a small record could fit 50 to a sector and a large record could fit 5. You'd need to read 10 times as much data from the disk using the large record.
It will make screen design harder as you will no longer be able to predict how wide your controls should be.
Interesting link: Why use a VARCHAR when you can use TEXT?
It's about PostgreSQL and MySQL, so the performance analysis is different, but the logic for "explicitness" still holds: Why force yourself to always worry about something that's relevant a small percentage of the time? If you saved an email address to a variable, you'd use a 'string' not a 'string limited to 80 chars'.
I jsut don't understand why you'd want to let someone enter in a name of 8000+ characters.
One problem is that if you are having to work with multiple versions of SQL Server, the MAX will not always work. So if you are working with legacy DB's or any other situation that involves multiple versions, you better be very careful.
Sometimes you want the data type to enforce some sense on the data in it.
Say for example you have a column that really shouldn't be longer than, say, 20 characters. If you define that column as VARCHAR(MAX), some rogue application could insert a long string into it and you'd never know, or have any way of preventing it.
The next time your application uses that string, under the assumption that the length of the string is modest and reasonable for the domain it represents, you will experience an unpredictable and confusing result.
Bad idea when you know the field will be in a set range- 5 to 10 character for example. I think I'd only use max if I wasn't sure what the length would be. For example a telephone number would never be more than a certain number of characters.
Can you honestly say you are that uncertain about the approximate length requirements for every field in your table?
I do get your point though- there are some fields I'd certainly consider using varchar(max).
Interestingly the MSDN docs sum it up pretty well:
Use varchar when the sizes of the column data entries vary considerably. Use varchar(max) when the sizes of the column data entries vary considerably, and the size might exceed 8,000 bytes.
Think of it as just another safety level. You can design your table without foreign key relationships - perfectly valid - and ensure existence of associated entities entirely on the business layer. However, foreign keys are considered good design practice because they add another constraint level in case something messes up on the business layer. Same goes for field size limitation and not using varchar MAX.
legacy system support. If you have a system that is using the data and it is expected to be a certain length then the database is a good place to enforce the length. This is not ideal but legacy systems are sometime not ideal. =P
If all of the data in a row (for all the columns) would never reasonably take 8000 or fewer characters then the design at the data layer should enforce this.
The database engine is much more efficient keeping everything out of blob storage. The smaller you can restrict a row the better. The more rows you can cram in a page the better. The database just performs better when it has to access fewer pages.
A reason NOT to use max or text fields is that you cannot perform online index rebuilds i.e. REBUILD WITH ONLINE= ON even with SQL Server Enterprise Edition.
The job of the database is to store data so that it can be used by the enterprise. Part of making that data useful is ensuring that it is meaningful. Allowing someone to enter an unlimited number of characters for their first name isn't ensuring meaningful data.
Building these constraints into the business layer is a good idea, but that doesn't ensure that the database will remain intact. The only way to guarantee that the data rules are not violated is to enforce them at the lowest level possible in the database.