tags:

views:

407

answers:

5

Background:

I am aware of this SO question about Transactional NTFS (TxF) and this article describing how to use it, but I am looking for real-world experience with a reasonably high-volume enterprise system where lots of blob data (say documents and/or photos) need to be persisted once transactionally and read many times.

  • We are expecting a few tens of thousands of documents written per day and reads of several tens of thousands per hour.
  • We could either store indexes within the file system or in SQL Server but must be able to scale this out over several boxes.
  • We must retain the ability to back up and restore the data easily for disaster recovery.

The Question:

  • Any real-world, enterprise-grade experience with Transactional NTFS (TxF)?

Related questions:

  • Anyone tried distributed transactions using TxF where the same file is committed to two mirror servers at once?
  • Anyone tried a distributed transaction with the file system and a database?
  • Any performance concerns/reliability concerns/performance data you can share? Has anyone even done something on this scale before where transactions are a concern?

Edits: To be more clear, I have researched other technologies, including SQL Server 2008's new FILESTREAM data type, but this question is specificially targeted at the transactional file system only.

More Resources:

+2  A: 

Have you considered filestream support in SQL Server 2008 (if you're using SQL Server 2008 of course)? I'm not sure about performance, but it offers transactionality and supports backup/restore.

Ronald Wildenberg
+1 for the excellent suggestion. However, I have already researched this and am specificially interested in experience with the transactional file system. I updated the question to reflect this.
Jerry Bullard
A: 

While I don't have extensive experienve with TxF, I do have experience with MS DTC. TxF itself is fairly performant. When you throw in the MS DTC to handle multiple resource managers across multiple machines, performance takes a considerable hit.

From your description, it sounds like you are storing and indexing very large volumes of unstructured data. I assume that you also need the ability to search for this data. As such, I would highly recommend looking into something like Microsoft's Dryad or Google's MapReduce and a high performance distributed file system to handle your unstructured data storage and indexing. The best examples of high-volume enterprise systems that store and index massive volumes of blob data are Internet search engines like Bing and Google.

There are quite a few resources available for managing high-throughput unstructured data, and they would probably solve your problem more effectively than SQL Server and NTFS.

I know its a bit farther out of the box than you were probably looking for...but you did mention that you had already exhausted all other search avenues around the NTFS/TxF/SQL box. ;)

jrista
Thanks, jrista. I appreciate the information, but I cannot officially accept your answer because it does not specifically address TxF. I updated the question again to be more explicit. Thanks again for trying to help.
Jerry Bullard
+5  A: 

Unfortunately, it appears that the answer is "No."

In nearly two weeks (one week with a 100 point bounty) and 156 views, no one has answered that they have used TxF for any high-volume applications as I described. I can't say this was unexpected, and of course I cannot prove a negative, but it appears this feature of Windows is not well known or frequently used, at least by active members of the SO community at the time of writing.

If I ever get around to writing some kind of proof of concept, I'll post here what I learn.

Jerry Bullard
+2  A: 

I suppose "real-world, enterprise-grade" experience is more subjective than it sounds.

Windows Update uses TXF. So it is being used quite heavily in terms of frequency. Now, it isn't doing any multi-node work and it isn't going through DTC or anything fancy like that, but it is using TXF to manipulate file state. It coordinates these changes with changes to the registry (TXR). Does that count?

A colleague of mine presented this talk to SNIA, which is pretty frank about a lot of the work around TXF and might shed a little more light. If you're thinking of using TXF, it's worth a read.

jrtipton
It is difficult to see this, but I had a bounty on this and had to accept the "no" answer in order to stop the points from being awarded. You are correct that Windows Update is a good example of something real world. It is high volume as well, but not in the way I was thinking (per machine it is fairly low volume). Still +1 for this answer. Thanks.
Jerry Bullard
A: 

Ronald: FileStream is layered on top of TxF.

JR: While Windows Update uses TxF/KTM and demonstrates it's utility, it is not a high throughput application.

MJZ
@MJZ: Wait until you have enough rep to comment.
John Saunders