Speed File System vs. Database for Frequent Data Processing

tags:

sql-server

views:

answers:

Speed File System vs. Database for Frequent Data Processing

Hello, I need to give data to a data processing windows service (one-way, loosely coupled). I want to ensure that the service being down etc. doesn't result in 'lost' data, that restarting the windows service simply causes it to pick up work where it left and I need the system to be really easy to troubleshoot, which is why I'm not using MSMQ.

So I came up with one of two solutions - either:

I drop text files with the processing data into a drop directory and the windows service waits for file change notifications, processes and deletes the file then

I insert data in a special table in the local MS SQL database, and the windows service polls the database for changes/new items and then erases them as they are processed

The MSSQL database is local on the system, not over the network, but later on I may want to move it to a different server.

Which, from a performance (or other standpoint) is the better solution here?

+2 A:

From a performance perspective, it's likely the filesystem will be fastest - perhaps by a large margin.

However, there are other factors to consider.

It doesn't matter how fast it is, generally, only whether it's sufficiently fast. Storing and retrieving small blobs is a simple task and quite possibly this will never be your bottleneck.
NTFS is journalled - but only the metadata. If the server should crash mid-write, a file may contain gibberish. If you use a filesystem backend, you'll need to be robust against arbitrary data in the files. Depending on the caching layer and the way the file system reuses old space, that gibberish could contains segments of other messages, so you'd best be robust even against an old message being repeated.
If you ever want to add new features involving a richer message model, a database is more easily extended (say, some sort of caching layer).
The filesystem is more "open" - meaning it may be easier to debug with really simple tools (notepad), but also that you may encounter more tricky issues with local indexing services, virus scanners, poorly set permissions, or whatever else happens to live on the system.
Most API's can't deal with files with paths of more than 260 characters, and perform poorly when faced with huge numbers of files. If ever your storage directory becomes too large, things like .GetFiles() will become slow - whereas a DB can be indexed on the timestamp, and the newest messages retrieved irrespective of old clutter. You can work around this, but it's an extra hurdle.
MS SQL isn't free and/or isn't installed on every system. There's a bit of extra system administration necessary for each new server and more patches when you use it. Particularly if your software should be trivially installable by third parties, the filesystem has an advantage.

I don't know what your building, but don't prematurely optimize. Both solutions are quite similar in terms of performance, and it's likely not to matter - so pick whatever is easiest for you. If performance is ever really an issue, direct communication (whether via IPC or IP or whatnot) is going to be several orders of magnitude more performant, so don't waste time microoptimizing.

Eamon Nerbonne 2009-10-01 07:14:10

My experience with 2005 and lower is that it's much slower with the database.
Especially with larger file.. That really messes up SQL server memory when doing table scans

However
The new SQL server 2008 has better file support in the engine.

Julian de Wit 2009-10-01 07:17:53

ansaurus

tags:

views:

answers:

Speed File System vs. Database for Frequent Data Processing

related questions