views:

453

answers:

5

Does anyone have any best practices they can share with regards to storing XML field type data in SQL Server 2008? We have lots of small XML structures, yet some larger (>50MB). We're finding that things get a little slow on DELETE. Any advice/war stories would be appreciated.

+1  A: 

You may want to store the large files as a file, and store the path in the database, unless you are somehow planning on doing a search on the xml files as part of your select.

I tend to prefer storing large files outside the database, as it really isn't designed, IMO, for storing these. If you are going to be searching then you could use DLINQ and XLINQ to facilitate the searching of the various xml files.

James Black
+2  A: 

I agree with storing the large file outside of the database

You can either store the path to the file

In one project where I worked on I had another table which would keep track of all the user's uploaded data in a webapp ... whenever the user would upload the file I would create a new row in this table and use the fileID primary key as a foreign key in various other table

It greatly reduced many changes that came in later like when I had to change the root path of the upload directory etc

jsshah
+3  A: 

Another vote for outside the database.

In the past, I've used an approach similar to what James recommends, but SQL Server 2008 supports a new FILESTREAM storage option, which can store varbinary(max) columns outside the database on NTFS, and might be worth looking into.

SQL Books Online has a lot of good information, starting with "FILESTREAM Overview".

kirkus
FileStream is a good choice if you can treat the XML as a varbinary Type. It's reasonably safe if you limit permissions to the FILESTREAM container. FILESTREAM elements are also included in the database backups. And you can full-text search it, too (http://msdn.microsoft.com/en-us/library/ms142531.aspx).
ebpower
+2  A: 

I see that most of the answers so far are for outside the database.

We have done this once, adding the file to the file system, and the name of the file in the a table in the database. The main problems with this was:

  • the file system is not transactional, so it could get out of sync if something went wrong
  • you had to take backup seperatly, and restore would by definition be out of sync

For all new projects we have stored files in varbinary(max) fields. This has worked well for us, also under loads of 10's of thousands of users.

Shiraz Bhaiji
+1  A: 

Store meta data!

Outside the database is the way we store large datasets as well, except I strongly recommend adding some meta-information to the file so that in case the files get out of sync with the DB, you would be able to semi-automatically resync it back. This way, you can first create or update the file, and later update the database, and not worry that database update will crash.

Large number of files management Most file systems will be ok storing large number of files together, but they do start working a bit slow with time. Highly recommend doing subfolders based on some hash value. For example, if all filenames are integers, store 10000 files per dir, and calculate the dir name as (filename % 10000) * 10000 -- you will be able to find the file easier this way when debuging.

Yurik