views:

1027

answers:

5

Our build is annoyingly slow. It's a Java system built with Ant, and I'm running mine on Windows XP. Depending on the hardware, it can take between 5 to 15 minutes to complete.

Watching overall performance metrics on the machine, as well as correlating hardware differences with build times, indicates that the process is I/O bound. It also shows that the process does a lot more reading than writing.

However, I haven't found a good way to determine which files are being read or written, and how many times. My suspicion is that with our many subprojects and subsequent invocations of the compiler, the build is re-reading the same commonly used libraries many times.

Please suggest some profiling tools that will tell me what a given process is doing with which files. Free is nice, but not essential.


Using ProcessMonitor, as suggested by Jon Skeet, I was able to confirm my suspicion: almost all of the disk activity was reading and re-reading of libraries, with the JDK's copies of "rt.jar" and other libraries at the top of the list. I can't make a RAM disk large enough to hold all the libraries I used, but mounting the "hottest" libraries on a RAM disk cut build times by about 40%; clearly, Windows file system caching isn't doing a good enough job, even though I've told Windows to optimize for that.

One interesting thing I noticed is that the typical 'read' operation on a JAR file is just a few dozen bytes; usually there are two or three of these, followed by a skip several kilobytes further on in the file. It appeared to be ill-suited to bulk reads.

I'm going to do more testing with all of my third-party libraries on a flash drive, and see what effect that has.

+5  A: 

If you only need it for Windows, SysInternals Process Monitor should show you everything you need to know. You can select the process, then see each operation as it goes and get a summary of file operation as well.

Jon Skeet
Thanks John. I've used Process Explorer in the past. Is this a successor to that product, or something completely separate?
erickson
Process Explorer is sort of task manager alternative. Process Monitor shows you every I/O operation like opening file, writing to registry etc...
lacop
A: 

I used to build a massive Java webapp (JSP frontend) using Ant on Windows and it would take upwards of 3 minutes. I wiped my computer and installed Linux, and suddenly the builds took 18 seconds. Those are real numbers, albeit about 3 years old. I can only assume that Java prefers the Linux memory management and threading models to the Windows equivalents, as all Java programs appear to run better under Linux in my experience (especially Eclipse). Linux seems a lot better about preventing extra reads from the disk when you're doing a lot of reading of files that haven't changed (i.e. exectuables and libraries). This may be a property of the disk cache or the filesystem, I'm not sure which.

One of the great things about Java is that it's cross-platform, so setting up a Linux-based build server is actually an option for you. Being something of a Linux evangelist, I'd of course prefer to see you switch your dev environment to Linux, but I know that a lot of people don't want to do that (or can't for practical reasons).

If you're not willing to even set up a Linux build server to see if it runs faster, you could at least try defragmenting your Windows machine's hard drive. That makes a huge difference for C++ builds on my work computer. Try JkDefrag, which seems a lot better than the defragmenter that comes with Windows.

EDIT: I'd assume I got a downvote because my answer doesn't address the exact question asked. It is, however, in the tradition of StackOverflow to help people fix their real problem, not just treat the symptoms. I'm not one of those people for whom the answer to every question is "use linux". In this instance, however, I have very real, measured performance gains in exactly the situation the OP is asking about, so I thought it worth sharing my experiences.

rmeador
while I don't doubt switching to linux would improve performance, this is hardly an answer to a question regarding profiling IO on windows
sgibbons
Thanks rmeador. A lot of our developers do run Linux, and it does help. Its file system cache seems to be much better than Windows'. There's also some suspicion that Microsoft has deliberately hobbled performance of kernel calls by non-M$ code. ;) However, even Linux builds are too slow.
erickson
+1  A: 

Back when I still used Windows I used to get good results speeding my build up by having all build output written to a separate partition maybe 3GB in size, and periodically formatting that at night once a week via a scheduled task. It's just build output so it doesn't matter if it get unilaterally flattened occasionally.

But honestly, since moving to Linux, disk fragmentation is something I never worry about any more.

Another reason to try your build on Linux, at least once, is so that you can run strace (grepped for calls to open) to see what files your build is touching.

Ben Hardy
Procmon/Filemon give similar (actually) information to strace. I was able to see every open, meta-data query, read, and write operation.
erickson
+1  A: 

An oldie but a goodie: create a RAM disk and compile your files from there.

Jeffrey Fredrick
My goal with profiling the IO is to figure out what would benefit most from being on a RAM disk.
erickson
A: 

Actually the "FileMon" is a more direct tool than ProcMon. It is available here:

In general, when running performance analysis for disk IO, consider the following two:

  • throughput (speed of read/write of bytes per second)
  • latency (how much in waiting in Q for read/write)

Once you evaluate the performance of your system in terms of the above, it is easy to identify the bottleneck and take corrective action: get faster disks or change your code (which ever works out cheaper).

Sesh