views:

121

answers:

3

Guys, I am trying to write to different pieces of a large file using multiple threads, just like a segmented file downloaded would do.

My question is, what is the safe way to do this? Do I open the file for writing, create my threads, passing the Stream object to each thread? I don't want an error to occur because multiple threads are accessing the same object at potentially the same time.

This is C# by the way.

+12  A: 

I would personally suggest that you fetch the data in multiple threads, but actually write to it from a single thread. It's likely to be considerably simpler that way. You could use a producer/consumer queue (which is really easy in .NET 4) and then each producer would feed pairs of "index, data". The consumer thread could then just sequentially seek, write, seek, write etc.

Jon Skeet
Only problem is, a file could be enormous, a gigabyte long even. Wouldn't that consume a lot of memory to store it in memory and pass it to a single thread?
icemanind
Read Jon's answer again. He's not suggesting that you store the entire file in memory all at once.
Stephen Cleary
Yeah, sorry. I posted my comment before he edited it.
icemanind
@icemanind: Stephen is right - I wasn't suggesting storing it in memory at all. Just use a single file handle, and a single thread using it to write. If you're worried that you'll be producing data faster than it can be written, you could always make the queue block so that there were never more than a certain number of entries waiting to be processed. (And my edit only added an extra bit of elaboration... it's not like it changed the approach.)
Jon Skeet
@icemanind - Or another way to see it is that a gigabyte long file will consume a gigabyte of memory. So how much memory is installed in the target computer? Be careful of premature optimization.
Peter M
+1  A: 

If this were Linux programming, I would recommend you look into the pwrite() command, which writes a buffer to a file at a given offset. A cursory search of C# documentation doesn't turn up anything like this however. Does anyone know if a similar function exists?

Karmastan
@Karmastan: You'd just seek to a given section of a stream before writing. Using the Position property on Stream would be the simplest way of doing that.
Jon Skeet
Another approach from Unix-land would mmap() the file and perform the writes to the correct locations in memory. But files larger than ~2.5 gigs would be very difficult to handle on 32bit systems. (The kernel would simply save out data along the way, so even small-memory systems could do it -- the limit comes from the small amount of virtual memory.)
sarnold
@Jon: That would force you to synchronize all your writes to avoid a race condition on the file stream's position (thus defeating the purpose of having multiple writers). Suppose that two threads both modify `Position` and then both write data. One or both writes will go to the wrong location. The trick (that `pwrite()` allows) is to ignore the stream's position altogether.
Karmastan
@Karmastan: In that case there's no direct equivalent in .NET.
Jon Skeet
+1  A: 

Although one might be able to open multiple streams pointing to the same file, and use a different stream in each thread, I would second the advice of using a single thread for the writing absent some reason to do otherwise. Even if two or more threads can safely write to the same file simultaneously, that doesn't mean it's a good idea. It may be helpful to have the unified thread attempt to sequence writes in a sensible order to avoid lots of random seeking; the performance benefit from that would depend upon how effectively the OS could cache and schedule random writes. Don't go crazy optimizing such things if it turns out the OS does a good job, but be prepared to add some optimization if the OS default behavior turns out to perform poorly.

supercat