views:

311

answers:

3

Ok,

this is more one of these "conceptual questions", but I hope I got some pointers in the right direction. First the desired scenario:

  • I want to query an SFTP server for directory and file lists
  • I want to upload or download files simulaneously

Both things are pretty easy using a SFTP class provided by Tamir.SharpSsh, but if I only use one thread, it is kind of slow. Especially the recursion into subdirs gets very "UI blocking", because we are talking about 10.000 of directories.

My basic approach is simple, create some kind of "pool" where I keep 10 open SFTP connections. Then query the first worker for a list of dirs. If this list was obtained, send the next free workers (e.g. 1-10, first one is also free again) to get the subdirectory details. As soon as there is a worker free, send him for the subsubdirs. And so on...

I know the ThreadPool, simple Threads and did some Tests. What confuses me a little bit is the following: I basically need...

  • A list of threads I create, say 10
  • Connect all threads to the server
  • If a connection drops, create a new thread / sftp client
  • If there is work to do, take the first free thread and handle the work

I am currently not sure about the implementation details, especially the "work to do" and the "maintain list of threads" parts.

Is it a good idea to:

  • Enclose the work in an object, containing a job description (path) and a callback
  • Send the threads into an infinite loop with 100ms wait to wait for work
  • If SFTP is dead, either revive it, or kill the whole thread and create a new one
  • How to encapsulate this, do I write my own "10ThreadsManager" or are there some out

Ok, so far...

Btw, I could also use PRISM events and commands, but I think the problem is unrelated. Perhaps the EventModel to signal a done processing of a "work package"...

Thanks for any ideas, critic.. Chris

+2  A: 

A bunch of minor notes:

If you are using some .NET API that internally uses the ThreadPool, then you cannot do the infinite wait, since the OS owns threads from the ThreadPool, these threads are meant to be used "briefly" then returned back to the OS, that is the most robust behavior. Sure, the OS can grow the thread pool as necessary if you end up hogging them due to your long-running processing, but the better design would be to avoid that behavior.

If you are running on XP, you might also want to avoid ThreadPool (OS level, and hence .NET) since it was fixed/resigned in Vista and later, the XP version is considered to be less robust.

If you do use the ThreadPool, then you end up queueing up async work to it, since it is already waiting for work to do.

Writing your own ThreadManager is fairly easy to do, you can find lots of examples on that, but as always this sort of thing should be kept as simple as possible.

For your third bullet point, it is better to revive the SFTP connection than to kill off your whole thread. If you kill off a thread (assuming your ThreadManager can handle that, never kill threads from the OS ThreadPool, of course) than it will first have to return the unprocessed job back to some queue, feels like too much work.

Chris O
+1: Regarding your last paragraph - It is a lot more work to kill off the thread. Much better to kill the connection. No only spending time to revive the connection but also in spooling up the thread to do the work. Good notes on the ThreadPool, too.
dboarman
Ok, thanks for your feedback...I think I will use my own thread pool manager. And in addition to that, I will make it monitor general wrapper objects, containing their "job description", a thread to carry out the work, and stuff to manage internal details (like reviving the SFTP)..Basically the thread manager will only make sure there are 10 worker bees alive, keep a list of all jobs, and distribute the jobs and collects the "results"..
Christian
If I run into any particular problems, I might come back... (there should not be so many suprises, did already a lot of testing, and once you got the hang of it, its not that hard - only observable collection or windows in combination with threads give you some headache, but these seems to be not best practice things to do in threads, anyway..)Chris
Christian
A: 

An alternate approach is to look at using FtpDlx from WeOnlyDo (http://www.weonlydo.com/FtpDLX.NET/ftp.sftp.ftps.ssl.net.component.asp). It's only $229 and is fully managed .Net 2.0 library. It includes methods to recursively download a directory. It uses events to indicate progress and errors, allowing you to skip files, redirect where you write them, and examine and ignore errors. It's robust and works as advertised; we use it in multi-threaded production code.

You can run it in a separate thread and use the events to update your UI without blocking. You may even find that one thread doing the download makes good use of your bandwidth, but it will work fine with separate connections in separate threads. I would recommend you create your own threads instead of using the ThreadPool as you'll want to use callbacks and the threads will tend to be long-running anyway.

ebpower
A: 

edtFTPnet/PRO is another commercial solution. It supports recursive directory transfers, but also provides methods for directory synchronization which can be very useful.

The most recent release (7.1.0) also supports connection pools, which can significantly improve transfer times.

FTP, FTPS and SFTP are supported in the one component.

Bruce Blackshaw