views:

3520

answers:

12

I have an application written in .NET 3.5 that uses FTP to upload/download files from a server. The app works fine but there are performance issues:

  1. It takes a lot of time to make connection to the FTP server. The FTP server is on a different network and has Windows 2003 Server (IIS FTP). When multiple files are queued for upload, the change from one file to another creates a new connection using FTPWebRequest and it takes a lot of time (around 8-10 seconds).

  2. Is is possible to re-use the connection? I am not very sure about the KeepAlive property. Which connections are kept alive and reused.

  3. The IIS-FTP on Windows Server 2003 does not support SSL so anyone can easily see the username/password through a packet sniffer such as WireShark. I found that windows Server 2008 supports SSL over FTP in its new version if IIS 7.0.

I basically want to improve the upload/download performance of my application. Any ideas will be appreciated.

** Please note that 3 is not an issue but I would like people to have comments on it

+1  A: 

Personally I have migrated all of our apps away from using FTP for file upload/download, and instead rolled a solution based on XML Web Services in ASP.NET.

Performance is much improved, security is as much or as little as you want to code (and you can use the stuff built in to .NET) and it can all go over SSL with no issues.

Our success rate getting our clients' connections out through their own firewalls is FAR better than running FTP.

tomfanning
A: 

KeepAlive is working. FtpWebRequest caches connections inside, so they can be reused after some time. For details and explanation of this mechanism you can look to ServicePoint.

Another good source of information is to look into FtpWebRequest source (you can do it on VS2008).

arbiter
+1  A: 

You should definitely check out BITS which is a big improvement over FTP. The clear-text passwords aren't the only weakness in FTP. There's also the issue of predicting the port it will open for a passive upload or download and just overall difficulty when clients are using NAT or firewalls.

BITS works over HTTP/HTTPS using IIS extensions and supports queued uploads and downloads that can be scheduled at low priority. It's overall just a lot more flexible than FTP if you are using Windows on the client and server.

BITS for PowerShell

BITS for .NET

Josh Einstein
+5  A: 

It doesn't matter if the individual connections take long to connect as long as you can launch many in parallel. If you have many items to transfer (say hundreds) then it makes sense to launch tens and even hundreds of WebRequests in parallel, using the asynchronous methods like BeginGetRequestSream and BeginGetResponse. I worked on projects that faced similar problems (long connect/authenticate times) but by issuing many calls in parallel the overall throughput was actually very good.

Also it makes a huge difference if you use the async methods or the synchronous one, as soon as you have many (tens, hundreds) of requests. This applies not only to your WebRequests methods, but also to your Stream read/write methods you'll use after obtaining the upload/download stream. The Improving .Net Performance and Scalability book is a bit outdated, but much of its advice still stands, and is free to read online.

One thing to consider is that the ServicePointManager class sits there lurking in the Framwework with one sole purpose: to ruin your performance. Make sure you obtain the ServicePoint of your URL and change the ConnectionLimit to a reasonable value (at least as high as how many concurrent requests you intend).

Remus Rusanu
What does the ServiceManager has to do when I am just using a single connection. My application uploads multiple files one by one. What should I set the ConnectionLimit to? Also I observed that it takes a lot of time to make a connection when a connection is already active. Generally it takes 4.5 seconds for the first FTP GetRequestStream() to return. Then all subsequent connections take 1.3 seconds but if the connections overlap, it takes 12 seconds to create a connection.
A9S6
If a connection is already active the second one will not connect until the first one finishes, that's very thing the ServiceManager controls (throttles connection on your behalf). If you are dedicated to use one connection and serialize all request then ServiceManager will make no difference. My point was about doing all requests in parallel.
Remus Rusanu
This is a very good answer as long as the sequence that the files are retrieved in is not too important, the requests should be processed in the order that they are made, but obviously they can finish at different times depending on the size of the transfers.If you require that certain transfers finish before others (such as to control the order in which retrieved files are processed), this may not be the ideal method.
Martin Robins
A: 

AFAIK, each FtpWebRequest has to set up a new connection - including logon to the server. If you want to speed up the FTP transfers, I would recommend that you use an alternate FTP client instead. Some of these alternate clients can login and then perform multiple actions using the same command connection.

Examples of such clients incldue: http://www.codeproject.com/KB/IP/FtpClient.aspx which also includes a good explanation as to why these libraries can operate faster than the standard FtpWebRequest and http://www.codeproject.com/KB/macros/ftp_class_library.aspx which looks like a simple enough implementation also.

Personally, I rolled my own implementation of FTP back in the .NET 1.1 days before the FtpWebRequest was introduced and this still works well for .NET 2.0 onwards.

Martin Robins
+1  A: 

I'd recommend switching to rsync.
Pros :
Optimised for reducing transfer time.
Supports SSH for secure transfer
Uses TCP so makes your IT dept/firewall guys happier

Cons:
No native .NET support
Geared towards linux server installations - though there are decent windows ports like DeltaCopy

Overall though it's a much better choice than FTP

zebrabox
A: 

I have had good results with EDT's ftp library:

http://www.enterprisedt.com/products/edtftpnet/overview.html

Peter Recore
+1  A: 

Look at this page - http://www.ietf.org/rfc/rfc959.txt

It says of using different port when connecting to be able to reuse the connection.
Does that work?

shahkalpesh
+2  A: 

Debug Network

A few tricks for simple network debugging:

  1. Check the response times when you ping the FTP server from the application server.
  2. Check the response times for a trace route (tracert from a DOS shell).
  3. Transfer a file from the command-line using the ftp command.
  4. Connect to the FTP server via Telnet: telnet server 21.

The results will provide clues to solving the problem.

Network Hardware

For a slow trace route:

  • Determine why the two computers are having network issues.
  • Upgrade the network hardware between the slowest link.

Network Configuration

For a slow ping:

  • Check the network configuration on each machine.
  • Ensure the settings are optimal.

Validate API

A slow command-line FTP session will tell you that the problem is not isolated to the FTP API you are using. It does not eliminate the API as a potential problem, but certainly makes it less likely.

Network Errors

If packets are being dropped between the source and destination, ping will tell you. You might have to increase the packet size to 1500 bytes to see any errors.

FTP Queue Server

If you have no control over the destination FTP server, have an intermediary server receive uploaded files. The intermediary then sends the files to the remote server at whatever speed it can. This gives the illusion that the files are being sent quickly. However, if the files must exist on the remote server as soon as they are uploaded, then this solution might not be viable.

FTP Server Software

Use a different FTP daemon on the FTP server, such as ProFTPd as a Windows service. (ProFTPd has plug-ins for various databases that allow authentication using SQL queries.)

FTP Server Operating System

A Unix-based operating system might be a better option than a Microsoft-based one.

FTP Client Software

There are a number of different APIs for sending and receiving files via FTP. It might take some work to make your application modular enough that you can simply plug in a new file transfer service. A few different APIs are listed as answers here.

Alternate Protocol

If FTP is not an absolute requirement, try:

  • a Windows network drive
  • HTTPS
  • scp, rsync, or similar programs (Cygwin might be required)
Dave Jarvis
+1  A: 

I strongly suggest Starksoft FTP/FTPS Component for .NET and Mono. It has a connection object that you can cache and reuse.

Max Toro
+2  A: 

I have done some experimentation (uploading about 20 files on various sizes) on FtpWebRequest with the following factors

KeepAlive = true/false

e.g. ftpRequest.KeepAlive = isKeepAlive;

Connnection Group Name = UserDefined or null

e.g. ftpRequest.ConnectionGroupName = "MyGroupName";

Connection Limit = 2 (default) or 4 or 8

e.g. ftpRequest.ServicePoint.ConnectionLimit = ConnectionLimit;

Mode = Synchronous or Async

e.g. see [this example][1]

My results:

1) Use ConnectionGroupName,KeepAlive=true took (21188.62 msec)

2) Use ConnectionGroupName,KeepAlive=false took (53449.00 msec)

3) No ConnectionGroupName,KeepAlive=false took (40335.17 msec)

4) Use ConnectionGroupName,KeepAlive=true;async=true,connections=2 took (11576.84 msec)

5) Use ConnectionGroupName,KeepAlive=true;async=true,connections=4 took (10572.56 msec)

6) Use ConnectionGroupName,KeepAlive=true;async=true,connections=8 took (10598.76 msec)

Conclusions

1) FtpWebRequest has been designed to support an internal connection pool. To ensure, the connection pool is used, we must make sure the ConnectionGroupName is being set.

2) Setting up a connection is expensive. If we are connecting to the same ftp server using the same credentials, having the keep alive flag set to true will minimise the number of connections setup.

3) Asynchronous is the recommended way if you have a lot of files to ftp.

4) The default number of connections is 2. In my environment, a connection limit of 4 will give to me the most overall performance gain. Increasing the number of connections may or may not improve performance. I would recommend that you have the connection limit as a configuration parameter so that you can tune this parameter in your environment.

Hope you would find this useful.

Syd
A: 

This link describes ConnectionGroupName and KeepAlive affects: WebRequest ConnectionGroupName

Salar Khalilzadeh