views:

518

answers:

2

I have a process that is supposed to ftp a file to a remote location every 5 minutes.

It seems to have become stuck for a number of hours and hasn't been sending files.

I took a thread dump to see what was going on and this is the state of my thread:

"SPPersister" prio=6 tid=0x03782400 nid=0x16c4 runnable [0x0468f000..0x0468fd14]
   java.lang.Thread.State: RUNNABLE
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.read(Unknown Source)
        at sun.nio.cs.StreamDecoder.readBytes(Unknown Source)
        at sun.nio.cs.StreamDecoder.implRead(Unknown Source)
        at sun.nio.cs.StreamDecoder.read(Unknown Source)
        - locked <0x239ebea0> (a java.io.InputStreamReader)
        at java.io.InputStreamReader.read(Unknown Source)
        at java.io.BufferedReader.fill(Unknown Source)
        at java.io.BufferedReader.readLine(Unknown Source)
        - locked <0x239ebea0> (a java.io.InputStreamReader)
        at java.io.BufferedReader.readLine(Unknown Source)
        at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:294)
        at org.apache.commons.net.ftp.FTP._connectAction_(FTP.java:364)
        at org.apache.commons.net.ftp.FTPClient._connectAction_(FTPClient.java:540)
        at org.apache.commons.net.SocketClient.connect(SocketClient.java:178)
        at org.apache.commons.net.SocketClient.connect(SocketClient.java:268)
        ...

I am using the following code to connect:

FTPClient client = new FTPClient();
client.setConnectTimeout(10000);
client.connect(host); // <-- stuck here
client.setDataTimeout(20000);
client.setSoTimeout(20000);
client.login(user, pass);
client.changeWorkingDirectory(dir);

Shouldn't the connection attempt have timed out within 10 seconds?

+2  A: 

Yes, and no.

The connect will have timed out within ten seconds, assuming that the connect did not work, however the connect probably did work, and now it is trying to read data from the socket, most likely to get the initial FTP helo sequence out of the way[1]. Indeed, looking at the javadoc for connectAction(), which is where your stacktrace is stuck, that is exactly what it is doing.

You could try setting the data timeout before you call connect, that way it might actually fail in the way that you expect. If this doesn't work, you will most likely need to raise a bug with apache-commons. This bug is almost certainly the issue that you are seeing.

[1] According to RFC959:

One important group of informational replies is the connection greetings. Under normal circumstances, a server will send a 220 reply, "awaiting input", when the connection is completed. The user should wait for this greeting message before sending any commands. If the server is unable to accept input right away, a 120 "expected delay" reply should be sent immediately and a 220 reply when ready. The user will then know not to hang up if there is a delay.

That is why the FTPClient class is awaiting input from the foreign side.

Paul Wagland
sounds like this bug: http://www.mail-archive.com/[email protected]/msg55067.html
pstanton
@pstanton: Thanks, I have updated the answer to include a link to the bug.
Paul Wagland
i remember i put the `setDataTimeout` call after the `connect` call because i thought it might be like the `setSoTimeout` call whose doc states to only use after `connect`. have moved `setDataTimeout` up hopefully this resolves. thx.
pstanton
@Paul: do you think interrupting the thread will un-block it?
pstanton
@pstanton: Probably… but you would need to test it, since I cannot answer for sure.
Paul Wagland
+2  A: 

We had some java that was trying to FTP from a device, and it would inexplicably hang using commons-net/ftp. Just like what you are seeing. After considerable searching, I found a bug report somewhere indicating that it is a flaw with commons-net/ftp. The flaw occurs when you're waiting on a response, and the network goes down (we had flaky wireless). Once that occurred, it got itself into a wait that never returned.

The solution we found is unfortunately to use a different library. There are plenty out there, but this is the one we used. http://www.enterprisedt.com/products/edtftpj/overview.html

karoberts
You can probably get around this by setting the datatimeout explicitly… or by using a different library ;-)
Paul Wagland