views:

508

answers:

3

I am building a table for tracking the history of particular objects in my database. Currently I have following columns:

HistoryId int IDENTITY(1,1) NOT NULL
HistoryDate datetimeoffset(7) NOT NULL 
HistoryTypeId int NOT NULL
HistoryDetails nvarchar(max) NULL

For the most part each history item will be self-explanatory through the HistoryTypeId so HistoryDetails will either be Null or very small. But for a couple of History Types, the details data will be large. Is it ok to go with nvarchar(max) for all of the records or should I break it apart and have an extra column for the History Types that will require more than 64 characters (see below)? A rough estimate is that 80%-90% of records will not require more than 64 characters of detail information and there will be millions of records in the table.

HistoryId int IDENTITY(1,1) NOT NULL
HistoryDate datetimeoffset(7) NOT NULL 
HistoryTypeId int NOT NULL
HistoryDetails nvarchar(64) NULL
HistoryDetailsMore nvarchar(max) NULL
A: 

Since you are using nvarchar, you are already paying the variable length record overhead more than likely unless SQLServer overrides the variable length for small cases. However, the space on disk shouldn't change for the short records between nvarchar(64) and nvarchar(max). They should only take as much space as is needed to fit their data. Normally that number would only be used to constrain the data. If you don't want to constrain it, then you shouldn't pay a penalty between using those two that you aren't already paying.

Chris J
So you are saying just go with the single HistoryDetails nvarchar(max) column?
Jeff Widmer
I would imagine that unless you are indexing on that column or editting these fields regularly that would be a better solution than creating a seperate table, yes.
Chris J
+4  A: 

You cannot make NVARCHAR(MAX) a part of a key in a plain B-Tree index (you can still use it as an included column in an index).

Otherwise, the storage will be the same as long as the data in the column does not exceed the row size threshold.

Since you're probably not going to index this field anyway, it's a good idea to create it as NVARCHAR(MAX).

Even if you still want to index it (say, to do prefix searches using LIKE), you can create a computed NVARCHAR(450) column, create an index on that column, and add it to your queries for coarse filtering.

See this entry in my blog for more details:

If you are going to do exact searches for the small columns only, create a computed column, index it and query like this:

ALTER TABLE History ADD HistoryDetailsIndex AS SUBSTRING(HistoryDetails, 1, 50)

CREATE INDEX ix_mytable_typeid_details ON History (HistoryTypeId, HistoryDetailsIndex) INCLUDE (HistoryDetails)

SELECT  COUNT(*)
FROM    History
WHERE   HistoryTypeId = 123
        AND HistoryDetailsIndex LIKE 'string_prefix_up_to_50_characters%'
        AND HistoryDetails = 'string_prefix_up_to_50_characters_plus_everything_after_it'

This will include only only the first 50 characters from your HistoryDetails into the index key (which will be searched in a LIKE condition), and everything into the included column.

If you are absolutely sure you will never search for a string that is more than 50 characters long, you can omit the included column and just use this:

SELECT  COUNT(*)
FROM    History
WHERE   HistoryTypeId = 123
        AND HistoryDetailsIndex = 'string_prefix_up_to_50_characters'

This will make the index shorter.

However, this will fail if you provide a string more than 50 characters long, so use it if you are absolutely sure you will never search for long strings.

Quassnoi
What if it will exceed the row size threshold for those items that will require nvarchar(max)? Does that change anything?
Jeff Widmer
Internally, this column will be stored out of row (http://msdn.microsoft.com/en-us/library/ms186981.aspx). From the user's point of view, nothing will change: you work with the column as usual.
Quassnoi
So from what you are saying (and from your blog post) since the 80-90% of records that will have a small value in the column will need to be indexed, but the 10% that have a large value do not need to be indexed, I should go with the two column architecture. nvarchar(64) that is included in the index, and a non-indexed nvarchar(max). Can you confirm?
Jeff Widmer
What will you use the index for? If for prefix searches, just keep your data in a `NVARCHAR(MAX)` and create a computed column `NVARCHAR(450)`. This computed column will actually store and index the first `450` characters of your data, which usually is more than enough for any prefix search. If you use the index to avoid `key lookups / RID lookups`, then just create an index with `NVARCHAR(MAX)` as an included column. This index will not allow to do prefix lookups for your strings but will still fetch all data out of the index if you use only the columns the index covers in your query.
Quassnoi
I would use the index for something like this: SELECT COUNT(*) FROM History WHERE HistoryTypeId=123 AND HistoryDetails='XYZ' but only for the 80-90% of records that have small values in that column.
Jeff Widmer
A: 

First be aware that the varchar(MAX) can store upto 2gb of space, behind the scenes it actually uses a TEXT value, and subsequently it uses more processing than varchar(8000) or less.

If you're storing a lot of smaller data in a varchar(max) it will be treated as a normal varchar column, unless you exceed the 8000, after which it will be treated like a varchar(max).

Is the column indexed, or do you want to index it? If so steer clear of the varchar(max).

I would just pick a higher value, say varchar(255) and force users to fit to your database design and not the other way around.

Sres
Your comment on indexing conflicts with Quassnoi's answer. I will be indexing the column but only because of the Item Types that would have small data in that column (80-90% of the records).
Jeff Widmer