views:

1626

answers:

11

I'm working on a huge legacy Java application, with a lot of handwritten stuff, which nowadays you'd let a framework handle.

The problem I'm facing right now is that we are running out of file handles on our Solaris Server. I'd like to know what's the best way to track open file handles? Where to look at and what can cause open file handles to run out?

I cannot debug the application under Solaris, only on my Windows development environment. Is is even reasonable to analyze the open file handles under Windows?

A: 

It could certainly give you an idea. Since it's Java, the file open/close mechanics should be implemented similarly (unless one of the JVMs are implemented incorrectly). I would recommend using File Monitor on Windows.

C. Ross
+1  A: 

I would start by asking my sysadmin to get a listing of all open file descriptors for the process. Different systems do this in different ways: Linux, for example, has the /proc/PID/fd directory. I recall that Solaris has a command (maybe pfiles?) that will do the same thing -- your sysadmin should know it.

However, unless you see a lot of references to the same file, a fd list isn't going to help you. If it's a server process, it probably has lots of files (and sockets) open for a reason. The only way to resolve the problem is adjust the system limit on open files -- you can also check the per-user limit with ulimit, but in most current installations that equals the system limit.

kdgregory
+2  A: 

Hi there,

On windows you can look at open file handles using process explorer:

http://technet.microsoft.com/en-us/sysinternals/bb896653.aspx

On Solaris you can use "lsof" to monitor the open file handles

Benj
Thanks! I used lsof, unfortunately there's a lot of stuff going on and I don't really know how to narrow down the lsof results to what's relevant and what's not interesting.
david
Result from Windows should not be extrapolated to *nix systems. They have different mechanism of opening files.
St.Shadow
+1  A: 

Not a direct answer to your question but these problems could be the result of releasing file resources incorrectly in your legacy code. By example if you're working with FileOutputsStream classes make sure the close methods are called in a finally block as in this example:

FileOutputsStream out = null;
try {
  //You're file handling code
} catch (IOException e) {
  //Handle
} finally {
  if (out != null) {
    try { out.close(): } catch (IOException e) { }
  }
}
NickDK
What he said. sounds like the file handles are never being released.
ChadNC
Thanks for the general advice, but I've searched for all occurences of java.io.* and made sure they are in a try-catch-finally block.
david
+3  A: 

One good thing I've found for tracking down unclosed file handles is FindBugs:

http://findbugs.sourceforge.net/

It checks many things, but one of the most useful is resource open/close operations. It's a static analysis program that runs on your source code and it's also available as an eclipse plugin.

Benj
As a personal testimonial, I've experienced a similar problem to the one that the OP had (my app was throwing exceptions cause it couldn't open anymore files as I had too many open file descriptors). Running the code through findbugs helped to identify all the places where files weren't closed. Problem solved!
tthong
Yes, it once helped me find a whole slew of places where close() hadn't been called in an appropriate finally block.
Benj
although it didn't solve my problem directly, it was a great hint!
david
+1  A: 

To answer the second part of the question:

what can cause open file handles to run out?

Opening a lot of files, obviously, and then not closing them.

The simplest scenario is that the references to whatever objects hold the native handles (e.g., FileInputStream) are thrown away before being closed, which means the files remain open until the objects are finalized.

The other option is that the objects are stored somewhere and not closed. A heap dump might be able to tell you what lingers where (jmap and jhat are included in the JDK, or you can use jvisualvm if you want a GUI). You're probably interested in looking for objects owning FileDescriptors.

gustafc
A: 

This little script help me to keep eye on count of opened files when I need test ic count. If was used on Linux, so for Solaris you should patch it (may be :) )

#!/bin/bash
COUNTER=0
HOW_MANY=0
MAX=0
# do not take care about COUNTER - just flag, shown should we continie or not
while [ $COUNTER -lt 10 ]; do
    #run until process with passed pid alive
    if [ -r "/proc/$1" ]; then
     # count, how many files we have
     HOW_MANY=`/usr/sbin/lsof -p $1 | wc -l`
     #output for live monitoring
     echo `date +%H:%M:%S` $HOW_MANY
     # uncomment, if you want to save statistics
     #/usr/sbin/lsof -p $1 > ~/autocount/config_lsof_`echo $HOW_MANY`_`date +%H_%M_%S`.txt

     # look for max value
     if [ $MAX -lt $HOW_MANY ]; then
      let MAX=$HOW_MANY
      echo new max is $MAX
     fi 
     # test every second. if you don`t need so frequenlty test - increase this value
     sleep 1
    else
     echo max count is $MAX
     echo Process was finished
     let COUNTER=11
    fi
done

Also you can try to play with jvm ontion -Xverify:none - it should disable jar verification (if most of opened files is jars...). For leaks through not closed FileOutputStream you can use findbug (mentored above) or try to find article how to patch standard java FileOutputStream/FileInputStream , where you can see, who open files, and forgot close them. Unfortunatly, can not find this article right now, but this is existing :) Also think about increasing of filelimit - for up-to-date *nix kernels is not a problem handle more than 1024 fd.

St.Shadow
+2  A: 

This may not be practical in your case, but what I did once when I had a similar problem with open database connections was override the "open" function with my own. (Conveniently I already had this function because we had written our own connection pooling.) In my function I then added an entry to a table recording the open. I did a stack trace call and saved the identify of the caller, along with the time called and I forget what else. When the connection was released, I deleted the table entry. Then I had a screen where we could dump the list of open entries. You could then look at the time stamp and easily see which connections had been open for unlikely amounts of time, and which functions had done these opens.

From this we were able to quickly track down the couple of functions that were opening connections and failing to close them.

If you have lots of open file handles, the odds are that you're failing to close them when you're done somewhere. You say you've checked for proper try/finally blocks, but I'd suspect somewhere in the code you either missed a bad one, or you have a function that hands and never makes it to the finally. I suppose it's also possible that you really are doing proper closes every time you open a file, but you are opening hundreds of files simultaneously. If that's the case, I'm not sure what you can do other than a serious program redesign to manipulate fewer files, or a serious program redesign to queue your file accesses. (At this point I add the usual, "Without knowing the details of your application, etc.)

Jay
+1  A: 

I would double-check the environment settings on your Solaris box. I believe that by default Solaris only allows 256 file handles per process. For a server application, especially if it's running on a dedicated server, this is very low. Figure 50 or more descriptors for opening JRE and library JARs, and then at least one descriptor for each incoming request and database query, probably more, and you can see how this just won't cut the mustard for a serious server.

Have a look at the /etc/system file, for the values of rlim_fd_cur and rlim_fd_max, to see what your system has set. Then consider whether this is reasonable (you can see how many file descriptors are open while the server is running with the lsof command, ideally with the -p [process ID] parameter.

Andrzej Doyle
A: 

Its worth bearing in mind that open sockets also consume file handles on Unix systems. So it could very well be something like a database connection pool leak (e.g. open database connections not being closed and returned to the pool) that is leading to this issue - certainly I have seen this error before caused by a connection pool leak.

alasdairg
A: 

Google for an app called filemon from system internals.

BTW, to track this down you may be able to use something like aspectj to log all calls that open and close files and log where they occur.

vickirk
And that was voted down because?
vickirk