Performance implication on nvarchar(4000)?

tags:

sql-server

views:

141

answers:

Performance implication on nvarchar(4000)?

I have a column that is declared as nvarchar(4000) and 4000 is SQL Server's limit on the length of nvarchar. it would not be used as a key to sort rows.

Are there any implication that i should be aware of before setting the length of the nvarchar field to 4000?

Update

I'm storing an XML, an XML Serialized Object to be exact. I know this isn't favorable but we will most likely never need to perform queries on it and this implementation dramatically decreases development time for certain features that we plan on extending. I expect the XML data to be 1500 characters long on average but there can be those exceptions where it can be longer than 4000. Could it be longer than 4000 characters? It could be but in very rare occasions, if it ever happens. Is this application mission critical? Nope, not at all.

+2 A:

Do you certainly need nvarchar or can you go with varchar? The limitation applies mainly to sql server 2k. Are you using 2k5 / 2k8 ?

JonH 2009-10-26 18:40:24

I'm using 2k5. I dont need nvarchar but i thought nvarchar would always be better than varchar. Sorry for my noobish question, I'm not very experienced with SQL. Thanks

burnt1ce 2009-10-26 20:24:25

Ahh nevermind. Yes i can use varchar - this application does not need to support multiple languages

burnt1ce 2009-10-26 20:26:55

No problem you only want nvarchar if you plan on supporting a multi-lang app (unicode support). Otherwise varchar should be the correct choice for you.

JonH 2009-10-27 11:54:48

+4 A:

SQL Server has three types of storage: in-row, LOB and Row-Overflow, see Table and Index Organization. The in-row storage is fastest to access. LOB and Row-Overflow are similar to each other, both slightly slower than in-row.

If you have a column of NVARCHAR(4000) it will be stored in row if possible, if not it will be stored in the row-overflow storage. Having such a column does not necesarily indicate future performance problems, but it begs the question: why nvarchar(4000)? Is your data likely to be always near 4000 characters long? Can it be 4001, how will your applicaiton handle it in this case? Why not nvarchar(max)? Have you measured performance and found that nvarchar(max) is too slow for you?

My recommendation would be to either use a small nvarchar length, appropiate for the real data, or nvarchar(max) if is expected to be large. nvarchar(4000) smells like unjustified and not testes premature optimisation.

Update

For XML, use the XML data type. It has many advantages over varchar or nvarchar, like the fact that it supports XML indexes, it supports XML methods and can actually validate the XML for a compliance to a specific schema or at least for well-formed XML compliance.

XML will be stored in the LOB storage, outside the row.

Even if the data is not XML, I would still recommend LOB storage (nvarchar(max)) for something of a length of 1500. There is a cost associated with retrieving the LOB stored data, but the cost is more than compensated by macking the table narrower. The width of a table row is a primary factor of performance, because wider tables fit less rows per page, so any operation that has to scan a range of rows or the entire table needs to fetch more pages into memory, and this shows up in the query cost (is actualy the driving factor of the overall cost). A LOB stored column only expands the size of the row with the width of a 'page id', which is 8 bytes if I remember correctly, so you can get much better density of rows per page, hence faster queries.

Remus Rusanu 2009-10-26 18:44:27

Good point. I've updated my question to respond to your questions.

burnt1ce 2009-10-26 20:21:40

+3 A:

Are you sure that you'll actually need as many as 4000 characters? If 1000 is the practical upper limit, why not set it to that? Conversely, if you're likely to get more than 4000 bytes, you'll want to look at nvarchar(max).

I like to "encourage" users not use storage space too freely. The more space required to store a given row, the less space you can store per page, which potentially results in more disk I/O when the table is read or written to. Even though only as many bytes of data are stored as are necessary (i.e. not the full 4000 per row), whenever you get a bit more than 2000 characters of nvarchar data, you'll only have one row per page, and performance can really suffer.

This of course assumes you need to store unicode (double-byte) data, and that you only have one such column per row. If you don't, drop down to varchar.

Philip Kelley 2009-10-26 18:47:39

I tried adding a length of 5000 in SQL Studio Express and I got an error that 4000 is the maximum so i assume nvarchar(max) = nvarchar(5000). I will drop from nvarchar to varchar. Thanks a lot for the tip!

burnt1ce 2009-10-26 20:28:18

I think max is actually a lot bigger than 4000.

recursive 2009-10-26 20:48:39

ansaurus

tags:

views:

answers:

Performance implication on nvarchar(4000)?

related questions