views:

574

answers:

3

I have an application that receives chunks of data over the network, and writes these to disk. Once all chunks have been received, they can be decoded/recombined into the single file they actually represent.

I'm wondering if it's useful to use memory-mapped files or not - first for writing the single chunks to disk, second for the single file into which all of them are decoded.

My own feeling is that it might be useful for the second case only, anyone got some ideas on this?

Edit: It's a C# app, and I'm only planning an x64 version. (So running into the 'largest contigious free space' problem shouldn't be relevant)

A: 

I'd say both cases are relevant. Simply write the single chunks to their proper place in the memory mapped file. This of course is only useful if you know where each chunk should go, like in a bittorrent downloader. If you have to perform some extra analysis to know where the chunk should go, the benefit of memory mapped file might not be as large.

Amigable Clark Kant
A: 

Memory-mapped files are primarily used for Inter-Process Communication or I/O performance improvement.

In your case, are you trying to get better I/O performance?

Hate to point out the obivious, but Wikipedia gives a good rundown of the situation... http://en.wikipedia.org/wiki/Memory-mapped_file

Specifically...

The memory mapped approach has its cost in minor page faults - when a block of data is loaded in page cache, but not yet mapped in to the process's virtual memory space. Depending on the circumstances, memory mapped file I/O can actually be substantially slower than standard file I/O.

It sounds like you're about to prematurely optimize for speed. Why not a regular file approach, and then refactor for MM files later if needed?

kervin
I'm aiming for better IO performance.I'm getting in around 12MB/sec of data now (but in the future this will be much more) and need to be able to process it / write it back to disk as fast as possible.I've read the wikipedia article, and I understand the benefits when reading, but the best use and benefits when *writing* to files isn't exactly clear to me, which is why I'm asking for help understanding it :)
Pygmy
+1  A: 

Memory-mapped files are beneficial for scenarios where a relatively small portion (view) of a considerably larger file needs to be accessed repeatedly.

In this scenario, the operating system can help optimize the overall memory usage and paging behavior of the application by paging in and out only the most recently used portions of the mapped file.

In addition, memory-mapped files can expose interesting features such as copy-on-write or serve as the basis of shared-memory.

For your scenario, memory-mapped files can help you assemble the file if the chunks arrive out of order. However, you would still need to know the final file size in advance.

Also, you should be accessing the files only once, for writing a chunk. Thus, a performance advantage over explicitly implemented asynchronous I/O is unlikely, but it may be easier and quicker to implement your file writer correctly.

In .NET 4, Microsoft added support for memory-mapped files and there are some comprehensive articles with sample code, e.g. http://blogs.msdn.com/salvapatuel/archive/2009/06/08/working-with-memory-mapped-files-in-net-4.aspx.

I disagree that mmf's are for small views only. On 64bit systems you can easily put a view over the whole file. Re-positioning a view is an expensive IO operation.
Mikael Svenson
You're right. They can be used for arbitrarily large or whole-file views, in particular on 64-bit address space. But that's not where they shine, in particular, when the file is being read or written only once.My point is, in such cases async I/O will be as efficient, but is harder to implement correctly.