What is a maintainable way to store large text fields without sacrificing performance?

views:

349

answers:

+4 Q:

What is a maintainable way to store large text fields without sacrificing performance?

I have been dancing around this issue for awhile but it keeps coming up. We have a system and our may of our tables start with a description that is originally stored as an NVARCHAR(150) and I then we get a ticket asking to expand the field size to 250, then 1000 etc, etc...

This cycle is repeated on ever "note" field and/or "description" field we add to most tables. Of course the concern for me is performance and breaking the 8k limit of the page. However, my other concern is making the system less maintainable by breaking these fields out of EVERY table in the system into a lazy loaded reference.

So here I am faced with these same to 2 options that have been staring me in the face. (others are welcome) please lend me your opinions.

Change all may notes and/or descriptions to NVARCHAR(MAX) and make sure we do exclude these fields in all listings. Basically never do a: SELECT * FROM [TableName] unless is it only retrieving one record.
Remove all notes and/or description fields and replace them with a forign key reference to a [Notes] table.

CREATE TABLE [dbo].[Notes] ( [NoteId] [int] NOT NULL, [NoteText] [NVARCHAR](MAX) NOT NULL )

Obviously I would prefer use option 1 because it will change so much in our system if we go with 2. However if option 2 is really the only good way to proceed, then at least I can say these changes are necessary and I have done the homework.

UPDATE: I ran several test on a sample database with 100,000 records in it. What I find is that the because of cluster index scans the IO required for option 1 is "roughly" twice that of option 2. If I select a large number of records (1000 or more) option 1 is twice as slow even if I do not include the large text field in the select. As I request less rows the lines blur more. I a web app where page sizes of 50 or so are the norm, so option 1 will work, but I will be converting all instances to option 2 in the (very) near future for scalability.

+1 A:

I'd go with Option 2.

You can create a view that joins the two tables to make the transition easier on everyone, and then go through a clean-up process that removes the view and uses the single table wherever possible.

Tom Ritter 2009-02-17 22:17:07

+1 A:

The TEXT/NTEXT data type has practically unlimited length while taking up next to nothing in your record.

It comes with a few strings attached, like special behavior with string functions, but for a secondary "notes/description" type of field these may be less of a problem.

Tomalak 2009-02-17 22:18:21

+2 A:

You want to use a TEXT field. TEXT fields aren't stored directly in the row; instead, it stores a pointer to the text data. This is transparent to queries, though - if you ask for a TEXT field, it will return the actual text, not the pointer.

Essentially, using a TEXT field is somewhat between your two solutions. It keeps your table rows much smaller than using a varchar, but you'll still want to avoid asking for them in your queries if possible.

Xanthir 2009-02-17 22:57:47

+3 A:

Option 2 is better for several reasons:

When querying your tables, the large text fields fill up pages quickly, forcing the database to scan more pages to retrieve data. This is especially taxing when you don't actually need to return the text data.
As you mentioned, it gives you a clean break to change the data type in one swoop. Microsoft has deprecated TEXT in SQL Server 2008, so you should stick with VARCHAR/VARBINARY.
Separate filegroups. Having all your text data in a slower, cheaper storage location might be something you decide to pursue in the future. If not, no harm, no foul.

While Option 1 is easier for now, Option 2 will give you more flexibility in the long-term. My suggestion would be to implement a simple proof-of-concept with the "notes" information separated from the main table and perform some of your queries on both examples. Compare the execution plans, client statistics and logical I/O reads (SET STATISTICS IO ON) for some of your queries against these tables.

A quick note to those suggesting the use of a TEXT/NTEXT from MSDN:

This feature will be removed in a future version of Microsoft SQL Server. Avoid using this feature in new development work, and plan to modify applications that currently use this feature. Use varchar(max), nvarchar(max) and varbinary(max) data types instead. For more information, see Using Large-Value Data Types.

Cadaeic 2009-02-17 23:01:02

+1 A:

Just to expand on Option 2

You could:

Rename existing MyTable to MyTable_V2

Move the Notes column into a joined Notes table (with 1:1 joining ID)

Create a VIEW called MyTable that joins MyTable_V2 and Notes tables

Create an INSTEAD OF trigger on MyTable view which saves the Notes column into the Notes table (IF NULL then delete any existing Notes row, if NOT NULL then Insert if not found, otherwise Update). Perform appropriate action on MyTable_V2 table

Note: We've had trouble doing this where there is a Computed column in MyTable_V2 (I think that was the problem, either way we've hit snags when doing this with "unusual" tables)

All new Insert/Update/Delete code should be written to operate directly on MyTable_V2 and Notes tables

Optionally: Have the INSERT OF trigger on MyTable log the fact that it was called (it can do this minimally, UPDATE a pre-existing log table row with GetDate() only if existing row's date is > 24 hours old - so will only do an update once a day).

When you are no longer getting any log records you can drop the INSTEAD OF trigger on MyTable view and you are now fully MyTable_V2 compliant!

Huge amount of hassle to implement, as you surmised.

Alternatively trawl the code for all references to MyTable and change them to MyTable_V2, put a VIEW in place of MyTable for SELECT only, and not create the INSTEAD OF trigger.

My plan would be to fix all Insert/Update/Delete statements referencing the now deprecated MyTable. For me this would be made somewhat easier because we use unique names for all tables and columns in the database, and we use the same names in all application code, so making sure I had found all instances by a simple FIND would be high.

P.S. Option 2 is also preferable if you have any SELECT * lying around. We have had clients whos application performance has gone downhill fast when they added large Text/Blob columns to existing tables - because of "lazy" SELECT * statements. Hopefully that isn;t the case in your shop though!

Kristen 2009-02-18 07:58:17

ansaurus

tags:

views:

answers:

What is a maintainable way to store large text fields without sacrificing performance?

related questions