views:

448

answers:

8

I have an app that moves a project and its files from preview to production using a Flex front-end and a .NET web service. Currently, the process takes about 5-10 mins/per project. Aside from latency concerns, it really shouldn't take that long. I'm wondering whether or not this is a good use-case for multi-threading. Also, considering the user may want to push multiple projects or one right after another, is there a way to queue the jobs.

Any suggestions and examples are greatly appreciated.

Thanks!

+3  A: 

Something that does heavy disk IO typically isn't a good candidate for multithreading since the disks can really only do one thing at a time. However, if you're pushing to multiple servers or the servers have particularly good disk subsystems some light threading may be beneficial.

Donnie
Since it looks like he's copying files from one server to another (usually you have different servers for production and staging), such an operation *would* benefit from asynchronous I/O or threading, in which case you would read one buffer, then while you write that buffer to the destination, you read another buffer from the source, in parallel. But before he figures out exactly why his current solution is slow, nobody can say what exactly would help the most.
Lasse V. Karlsen
A: 

If you're moving things between just two computers, the network is going to be the bottleneck, so you may want to queue these operations.

Likewise, on the same machine, the I/O is going to be the bottleneck, so you'd want to queue there, too.

R. Bemrose
What if the disk is slow? What if he's copying 1 byte at a time, and reporting progress to a GUI using a synchronous call across thread boundaries, or using what tantamounts to "Application.DoEvents" to ensure the GUI is updated to reflect his new progress? I agree with the principle that in an optimal solution, the network is going to be a bottleneck, but you cannot say that it is *the* bottleneck in his *current* solution.
Lasse V. Karlsen
+2  A: 

As a note - regardless of whether or not you decide to queue the jobs, you will use multi-threading. Queueing is just one way of handling what is ultimately solved using multi-threading.

And yes, I'd recommend you build a queue to push out each project.

Doug
A: 

You should try using the ThreadPool.

ThreadPool.QueueUserWorkItem(MoveProject, project);
ChaosPandion
What about a FileStream, should he use that as well?
Lasse V. Karlsen
I assume you are being sarcastic. The reason I mentioned this is because it would literally make the code multi-threaded in 1 line rather than rolling your own queuing system. This means he can see if there is a benefit almost immediately.
ChaosPandion
By the way, just because you have 33k points does not give you the right to be a snobby jerk.
ChaosPandion
+1  A: 

You should compare the speed of your code compared to just copying in Windows (i.e., explorer or command line) vs copying with something advanced like TeraCopy. If your code is significantly slower than Window then look at parts in your code to optimize using a profiler. If your code is about as fast as Windows but slower than TeraCopy, then multithreading could help.

Multithreading is not generally helpful when the operation I/O bound, but copying files involves reading from the disk AND writing over the network. This is two I/O operations, so if you separate them onto different threads, it could increase performance. For something like this you need a producer/consumer setup where you have a Circular queue with one thread reading from disk and writing to the queue, and another thread reading from the queue and writing to the network. It'll be important to keep in mind that the two threads will not run at the same speed, so if the queue gets full, wait before writing more data and if it's empty, wait before writing. Also the locking strategy could have a big impact on performance here and could cause the performance to degrade to slower than a single-threaded implementation.

Sam
A: 

Agreed with everyone over the limited performance of running the tasks in parallel.

If you have full control over your deployment environment, you could use Rhino Queues:

http://ayende.com/Blog/archive/2008/08/01/Rhino-Queues.aspx

This will allow you to produce a queue of jobs asynchronously (say from a WCF service being called from your Silverlight/Flex app) and consume them synchronously.

Alternatively you could use WCF and MSMQ, but the learning curve is greater.

Chris Smith
A: 

When dealing with multiple files using multiple threads usually IS a good idea in concerns of performance.The main reason is that most disks nowadays support native command queuing.

I wrote an article recently about reading/writing files with multiple files on ddj.com.

See http://www.ddj.com/go-parallel/article/showArticle.jhtml?articleID=220300055.

Also see related question http://stackoverflow.com/questions/1033065/will-using-multiple-threads-with-a-randomaccessfile-help-performance/1254378#1254378

In particular i made the experience that when dealing with very many files it IS a good idea to use a number of threads. In contrary using many thread in many cases does not slow down applications as much as commonly expected.

Having said that i'd say there is no other way to find out than trying all possible different approaches. It depends on very many conditions: Hardware, OS, Drivers etc.

RED SOFT ADAIR
A: 

The very first thing you should do is point any kind of profiling tool towards your software. If you can't do that (like, if you haven't got such a tool), insert logging code.

The very first thing you need to do is figure out what is taking a long time to complete, and then why is it taking a long time to complete. That your "copy" operation as a whole takes a long time to complete isn't good enough, you need to pinpoint the reason for this down to a method or a set of methods.

Until you do that, all the other things you can do to your code will likely be guesswork. My experience has taught me that when it comes to performance, 9 out of 10 reasons for things running slow comes as surprises to the guy(s) that wrote the code.

So measure first, then change.

For instance, you might discover that you're in fact reporting progress of copying the file on a byte-per-byte basis, to a GUI, using a synchronous call to the UI, in which case it wouldn't matter how fast the actual copying can run, you'll still be bound by message handling speed.

But that's just conjecture until you know, so measure first, then change.

Lasse V. Karlsen
You're right, I don't have a profiling tool. I use VS Professional not the Team Development version. - What do you mean logging code? Break points? Thanks!
Like Debug.WriteLine("point #1: " + DateTime.Now);
Lasse V. Karlsen