views:

1226

answers:

7

Hi,

Currently we have thousands of Microsoft Word files, Excel files, PDF's, images etc stored in folders/sub folders. These are generated by an application on a regular basis and can be accessed at any time within that application. As we look to upgrade we are now looking into storing all these documents within SQL Server 2005 instead. Reasons for this are based on being able to compress the documents, adding additional fields to store more information on those documents and applying index’s where necessary.

I suppose what I’m after is the pros and cons of using SQL Server as a document repository instead of keeping them on the file server, as well as any experience you might have in doing this.

We would be using C# and Windows Workflow to do this task.

Thanks for your comments.

Edit


How big are the files?

between 100k = 200k in size (avg. 70KB)

How many will be?

At the moment it’s around 3.1 Million files (ranging from Word/Excel and PDF's), which can grow by 2,600 a day. (The growth will also increase over time)

How many reads?

This one is difficult to quantify as our old system/application makes it hard to work this out.


Also another useful link pointed out on a similar post covers the pros and cons of both methods.

Files Stored on DB vs FileSystem - Pros and Cons

+1  A: 

What kind of documents are we talking about?

Storing documents in your SQL server might be useful because you can relate the documents to other tables and use techniques like Full-text indexing and do things like fuzzy searches.

A downside is that it might be a bit harder to create a backup of the documents. And compression is also possible with NTFS compression or other techniques.

Zyphrax
They would be Mircosoft Word Docments.
kevchadders
sry... also Excel and PDF files too.
kevchadders
JamesM
+5  A: 

I would have both.

I would keep the files renamed with an unique name, thus easier to manage, and i would keep all meta data inside the database (file name, content-type, location on file system, size, description, etcetera), so the files are accessed through the database (indirectly).

Advantages:

  • files are easy to handle; you can bring several drives in the mix
  • the database can keep any number of meta information, including file description on which you can search against.
  • keep track on file accesses and other statistic information
  • rearrange the files using various paradigms: tree (directory structure), tags, search or context

You can have compression on a drive also. You can have RAID for backup and speed.

Mercer Traieste
+1  A: 

Are these documents text based and are you planning on using SQL Server's full text search to search these documents? If not, I don't see any benefit in storing these documents on the database. Ofcourse, you can always store the meta data related to the documents including the path information to the database.

msvcyc
We would utilitse Full Text Searching as well as adding additional fields to store meta data on those docs.
kevchadders
+1  A: 

A big benefit of stroing docs in the DB is it becomes much easier to control security access to them, as you can do it all via access control in your app. Storing them on a file server requires dealing with access priveledges at the file and folder level to prevent any direct access. Also have them in a DB makes for a single point of backup, so you can more easily make a full copy and/or move it around if needed.

schooner
+7  A: 

If you upgrade all the way, to SQL Server 2008, then you can use the new FILESTREAM feature, that allows the document to appear as a column in a table, yet to reside as a file on a share, where it can be directly accessed by a program (like Word).

John Saunders
Nice, didn't know about that capability.
kenny
+7  A: 

rule of thumb for doc size is:

size < 256 kb: store in db
265 kb < size < 1 MB: test for your load
size > 1 Mb: store on file system

EDIT: this rule of thumb also applies for FILESTREAM storage in SQL Server 2008

Mladen Prajdic
+1 A pragmatic approach.
Mercer Traieste
+1  A: 

Rather than writing a custom DMS (document management system), you should probably consider buying one or using WSS / SharePoint as this will handle all the mundane details (storage, indexing, meta-data) and let you build your custom functionality on top.

marcus.greasly
Sharepoint was considered but the cost was deemed too much! :(
kevchadders
Which cost? It's free per-server.
John Saunders
My understanding was Windows SharePoint Services (WSS) is the free version which comes with a lot of limitation. The licensed version I was on about was Microsoft Office SharePoint Server (MOSS). Apologies for any confusion.
kevchadders
Nice link on WSS limitations: http://social.technet.microsoft.com/Forums/en-US/sharepointgeneral/thread/9dfa09f4-9b55-44fa-9213-1437a24d1a50
kevchadders
Maybe worth looking at open-source DMS then (mind you, MOSS is good value for what you get - if you use a lot of the connectivity / office integration).SharePoint or a DMS provides the 'heavy lifting' - storing millions of docs (reliably), indexing, metadata etc meaning the development work isn't re-inventing the wheel...
marcus.greasly