views:

1100

answers:

3

I am looking at FILESTREAM attribute in SQL Server to store files in it. I understand it stores the files on hard drive and stores the file pointer/path information in DB. Also, maintains transactional consistency in the process.

There also seems to be a limitation "FILESTREAM data can be stored only on local disk volumes" for the FILESTREAM attribute.

If i anticipate my web app to store 200,000 images of 1-2mb each, i would require around 200gb of hard drive space to store the images. Since, the FILESTREAM requires all data to be stored only on local disk as per the limitation, it would be impossible to store millions of files on a single hard drive, as the storage requirements would be extremely large.

Is my understanding of the limitation correct or am i missing anything here?

If this limitation is correct, i would instead store it in db as plain blob and cluster my DB for increase in storage requirements, which doesn't seem to be possible with FILESTREAM.

Please share your thoughts!

UPDATED:
Few more questions regarding FILESTREAM:-

  1. How to handle data recovery in case of data container corruption?
  2. Can we just backup the DB without the file system data? [assuming data is in SAN, which need not be moved]
  3. I would like to back up or restore the DB and just remap the filegroup path information [that maps to SAN]. Is this possible?
+5  A: 

FILESTREAM does not actually require local storage, just not SMB network storage. An iSCSI or Fiber Channel SAN works fine to store FILESTREAM data. You can also have multiple filestream file groups per table, essentially partitioning your data. If you are strictly targeting sql server 2008 there is very little reason to not use filestream for large binary data. There is a Microsoft whitepaper describing filestream partitioning here.

Jeff Mc
@Jeff: Great post! It has given lot of clarity and few more questions, which i have updated.
pencilslate
+2  A: 

On the local disk volume requirement

Do not take local to literally. While it is indeed a requirement that MSSQL should "see" the filegroup(s) associated with FILESTREAM data as local drives, this storage is often supplied by way of NAS or other storage technologies which trick Windows into thinking these are local NTFS disks (by way of iSCSI and such). This is particularly true with enterprise applications, with the level of space requirement you mention.

On using FILESTREAM at all...

Do weigh the pros and cons carefully. Your question mentions rather big (MB-size) images (i'm assuming graphic images, not logic images of sorts), which implies a rather atomic use of them. A file server setup would require external (to SQL server) management and synchronization, but this seems to be a relatively small cost to pay to keep your freedom, not so much vis-a-vis SQL Server / Microsoft, but also your ability to move things around more easily for scaling / bandwidth purposes.

mjv
@mjv: The freedom to move things around is the chief concern. What would happen during data container corruption? Ability to just backup the DB alone and later remap the filegroup path? these are few more questions that propped up based on your explanation..
pencilslate
@pencilslate: SQL server is effectively managing the FILESTREAM (FS) datastore(s), so the backup for the FS stores is part of the SQL backup/recovery model. One can explicitly exclude the FS-related storage locations from the regular SQL backup and manage this backup externally; doing so tends to defeat the purpose, so one has to choose between ridiculously big backup/restore or manual management of separate recovery plans... So, unless there are compelling benefits to integrating the two data genres, a fully external repository system may just be preferable.
mjv
[cont.] With the non-FS solution a possible recovery strategy for the FS-type data is to have two online repositories, in distinct physical locations. These repositories are updated in parallel, minimizing the need for frequent "tape" backup. The secondary repository not only serves as backup, but as a stand-by server. This is particular interesting when the data stored are images, pdfs and other content that compress poorly, and therefore a similar amount of storage is required for formal backup or this mirror setup.
mjv
+1  A: 

Using a SQL Cluster doesn't give you any additional storage availability as clustering requires SAN storage. You can simply create a LUN or LUNs for use as FILESTREAM storage on a nonclustered instance as well.

mrdenny
@mrdenny: Can i just backup the db alone and remap the LUNs after db restore, thereby avoiding the need to backup filesystem data?
pencilslate
If your are using the FILESTREAM, then when you back up the database the files would also be backed up as well.
mrdenny