views:

276

answers:

2

I had some questions around the FILESTREAM capability of SQL Server 2008.

  1. What would the difference in performance be of returning a file streamed from SQL Server 2008 using the FILESTREAM capability vs. directly accessing the file from a shared directory?

  2. If 100 users requested 100 100Mb files (stored via FILESTREAM) within a 10 second window, would SQL Server 2008 performance slow to a crawl?

+4  A: 

If 100 users requested 100 100Mb files (stored via FILESTREAM) within a 10 second window, would SQL Server 2008 performance slow to a crawl?

On what kind of a server?? What kind of hardware to serve those files? What kind of disks, network etc.?? So many questions.......

There's a really good blog post by Paul Randal on SQL Server 2008: FILESTREAM Performance - check it out. There's also a 25-page whitepaper on FILESTREAM available - also covering some performance tuning tips.


But also check out the Microsoft Research TechReport "To Blog or Not To Blob" at:

http://research.microsoft.com/apps/pubs/default.aspx?id=64525

It's a very profound and very well based article that put all those questions through their paces.

Their conclusion:

The study indicates that if objects are larger than one megabyte on average, NTFS has a clear advantage over SQL Server. If the objects are under 256 kilobytes, the database has a clear advantage. Inside this range, it depends on how write intensive the workload is, and the storage age of a typical replica in the system.

So judging from that - if your blobs are typically less than 1 MB, just store them as a VARBINARY(MAX) in the database. If they're typically larger, then just the FILESTREAM feature.

I wouldn't worry so much about performance rather than other benefits of FILESTREAM over "unmanaged" storage in a NTFS file folder: storing files outside the database without FILESTREAM, you have no control over them:

  • no access control provided by the database
  • the files aren't part of your SQL Server backup
  • the files aren't handled transactionally, e.g. you could end up with "zombie" files which aren't referenced from the database anymore, or "skeleton" entries in the database without the corresponding file on disk

Those features alone make it absolutely worthwhile to use FILESTREAM.

marc_s
+ that the white paper I was trying to remember "FILESTREAM Storage in SQL Server 2008"
Remus Rusanu
Thanks for the response. If a web site accesses FILESTREAM files via the streaming API, what configuration must be done at the firewall to enable that traffic? Right now we open port 1433, but that is it.
John
+1  A: 

Reading a FILESTREAM over Win32 is quite fast. See Managing FILESTREAM Data by Using Win32. You should follow the FILESTREAM best practices though. After all, this is what powers Sharepoint and MS would not bet something as important as Office (==Sharepoint) on unperformance storage. There are some case studies and white papers around FILESTREAM, I could only digg out Laren Electronics Fuels Analysis of Formula One Racing Data with SQL Server but I know there are more with more detailed numeric data. If I recall correctly it shows that that FILESTREAM in general shadows SMB performance by about 90-95% factor, over a certain file size. For small files the overhead of obtaining the FILESTREAM API handle starts to show up.

I'd also second Marc in recommending reading over the Research paper on the topic (there is also a Channel 9 interview with Catharine van Ingen, available on iTunes podcasts too, where she speaks about this work), but bear in mind that the paper is published in 2006 before FILESTREAM was officially released, so it does not consider the FILESTREAM specifics.

As for your second question, asking about performance by only specifying the load and not the capacity of the system is a non-sense. A 128 CPU Superdome with a mountain of storage SANs won't even notice your load. A SQL runing on a 256 MB laptop with a mountain of spyware won't even get to see your load...

Remus Rusanu
+1 <hehe> you more vividly paint the picture of the difference the underlying hardware will make :-)
marc_s