views:

251

answers:

2

I have approximately 7 terabytes of various media files (pdf's, jpg's, tiff's) that currently reside on a very beefed up file server. I am looking at moving the data to SQL Server 2008 and using the Filestream attribute to help me manage the data. I want to do this because I have webpages that manage this media, and they (the webpages) are getting slower and slower as more media is added daily to the file server.

EDIT: The webpages are slow because many of them produce reports that reflect various details of the file server and what is stored on it. Essentially, the webpages comb through thousands of folders and files to generate reports about what is contained in them. Some webpages allow users to manipulate folders and files and move them to different locations. So, in a nutshell, I'm looking for a faster manner in managing these files. It would also allow me to maintain metadata about these files within the database, thus allowing me to query the database for this info instead of combing through the file server for it.

My Problems:

1) I have done a proof of concept and verified that I can create a filestream local to the SQL Server 2008 database, and I've successfully read and wrote media to it. However, I have yet to figure out how to use an UNC as a filestream. In other words, the database is hosted on MySQLDB08, and my files are stored on TheFileServer01. I've read it's possible, but I haven't gotten there yet. Any help on this would be greatly appreciated!

2) Since I have 7 terabytes (and growing) of media, will my backups be unmanageable due to their size? Is this something that could dissuade me from using Filestream?

Any suggestions or help would be greatly appreciated!

+3  A: 
  1. You can't. Afaik filestream data is stored localy and SQL will refuse to read/write from/to an UNC.
  2. Your full backups will contain the entire filestream data. Unmanageable? Definetly a very serious challenge.

My question would be what is the benefit you want to extract from the filestream? The usual benefits come from BLOB integration with database operations while keeping availability for Win32 file handle based operations:

Even though FILESTREAM technology has many attractive features, it may not be the optimal choice in all situations. As mentioned earlier, the size of the BLOB data and the access patterns are the most significant factors when deciding whether to store the BLOB data wholly within the database or by using FILESTREAM.

Size affects the following:

  • Efficiency with which the BLOB data can be accessed using either storage mechanism. As mentioned earlier, streaming access of large BLOB data is more efficient using FILESTREAM, but partial updates are (potentially much) slower.
  • Efficiency of backing up the combined structured and BLOB data using either storage mechanism. A backup that combines SQL Server database files and a large number of FILESTREAM files will be slower than a backup of just SQL Server database files of an equivalent total size. This is because of the extra overhead of backing up each NTFS file (one per FILESTREAM data value). This overhead becomes more noticeable when the FILESTREAM files are smaller (as the time overhead becomes a larger percentage of the total time to backup per MB of data).

From a pure performance point of view, there are many steps you can do on a file system level to improve performance. What is you current problem, why is your system throughput affected by the media size? It means you have a somewhere a choke point of contention, perhaps a directory enumeration, or some other barrier that causes you to scale the response time with the media size. Your access to the media should be O(1), maybe O(logn),b ut definetely not O(n).

I'd recommend you go over the SQL White Paper FILESTREAM Storage in SQL Server 2008, from where I found my quote about use cases.

Remus Rusanu
I've read through the white page numerous times, but, quite frankly, some of it goes over my head because it deals with areas of which I'm not familiar (eg - backups). Also, the white paper says *The use of a file system filter driver also allows remote access to the FILESTREAM data through a UNC path*, which indicates to me that storing filestream data on a different location other than the actual database server might be possible. Also, please see my edit in my original question; I hope it explains better why I am looking into using Filestream.
Jagd
The system driver allows access through an UNC *to* the filestream from *other* location. In other words, it allows you to share the filestream. It does not allows you to store the fielstream on a UNC. Note that at 7Tb, 'local' should probably mean some sort of SAN utility.
Remus Rusanu
About your edit: sounds to me that one aspect of the problem is that you don't have metadata in a relational format about your files, and you need to look into the file system for it. Can you keep metadata in tables, or it will quickly drift out of sync with the filesystem?
Remus Rusanu
@Remus - that is exactly the problem! I am unsure how to keep the metadata in sync with what is on the disk. I could probably come up with something if I had to though. I've lent a bit of thought to the problem last night after reading your answers, and now I'm tossing around the idea of just keeping the files on disk and storing their metadata within the database.
Jagd
A: 

I'm going to have to disagree with @RemusRusanu on the UNC issue. Although, @RemusRusanu makes some good points on why you would choose to use a filestream.

Anyway, You can uses UNC's for filestreams -- it wouldn't be of much use otherwise. Currently, I build a site that is using the UNC feature for servers in a web farm to read files from a SQL Filestream.

A few points on using UNC Filestreams ...

  • Access to UNC's is gated by SQL server. WTF? The point of the file stream is to merge the benefits of the file system (good streaming) and the benefits of SQL Server (good meta data, transactions and query~ability). How does SQL ensure the file access is transactional? You have to open the transaction and inside the transaction ask SQL Server for a file handle.

  • Said another way, you can't just navigate to the Filestream UNC from windows explorer.

  • If you're storing your binary in SQL server, then generally ~1.2MB is the break point where you should favor filestream over VarBinary. Here MS suggest 1 MB, but there is another in research paper I can't locate at the moment that suggested 1.2 is the break even point.

  • Enabling UNC access requires a distributed transaction, so both the SQL server and the consumer of the UNC path need distributed transactions enabled.

Following is a code snippet that shows how to retrieve a handle to a Filestream. There is one big caveat: the transaction isn't closed in this snippet. You will need to read the binary, then close the transaction. Leaving open transactions is clearly a no-no.

    public FileStream GetStream(string FilePath){
        FileStream FStream = null;

        Conn = new SqlConnection( MyConnectionStringHere );
        Conn.Open();
        txn = Conn.BeginTransaction();

        using (SqlCommand cmd = new SqlCommand("SELECT GET_FILESTREAM_TRANSACTION_CONTEXT()", Conn, txn)){

            Object obj = cmd.ExecuteScalar();
            TransContext = (byte[])obj;
        }

        SafeFileHandle SHandle =  NativeSqlClient.GetSqlFilestreamHandle (FilePath, NativeSqlClient.DesiredAccess.Read, TransContext);
        FStream = new FileStream(SHandle, FileAccess.Read);

        return FStream;
    }
EBarr
I believe that you may have misunderstood my question with regards to the UNC. I was simply inquiring as to why the [Filestream White Paper](http://msdn.microsoft.com/en-us/library/cc949109.aspx) seemed to indicate that a filestream could be stored on a different server from where the SQL Server 2008 database resides. The White Paper states, "The use of a file system filter driver also allows remote access to the FILESTREAM data through an UNC path." Remus cleared this up for me by explaining that this simply means the filestream can be shared (thusly accessed) on your network by an UNC path.
Jagd
(cont.) - In other words, I never had problems accessing the filestream itself programmatically through the Win 32 API. Although, I do believe your example is somewhat unconventional. I prefer to use the SqlFileStream libraries when reading from or writing to it.
Jagd
Also, I didn't give you the -1. I'm not sure who did. I don't see anything incorrect with your answer; I just think you misunderstood the guts of what Remus and I were getting at.
Jagd

related questions