tags:

views:

354

answers:

5

I am trying to create 100 files using FileOutputStream/BufferedOutputStream. I can see the CPU utilization is 100% for 5 to 10 sec. The Directory which i am writing is empty. I am creating PDF files thru iText. Each file having round 1 MB. I am running on Linux.

How can i rewrite the code so that i can minimize the CPU utilization?

+4  A: 

Is this in a directory which already contains a lot of files? If so, you may well just be seeing the penalty for having a lot of files in a directory - this varies significantly by operating system and file system.

Otherwise, what are you actually doing while you're creating the files? Where does the data come from? Are they big files? One thing you might want to do is try writing to a ByteArrayOutputStream instead - that way you can see how much of the activity is due to the file system and how much is just how you're obtaining/writing the data.

Jon Skeet
The Directory which i am writing is empty. I am creating PDF files thru iText. Each file having round 1 MB.I am running on Linux. And i found lot many difference using ByteArrayOutputStream
Niger
A: 

You're unlikely to be able to reduce the CPU load for your task, especially on a Windows system. Java on Linux does support Asynchronous File I/O, however, this can seriously complicate your code. I suspect you are running on Windows, as File I/O generally takes much more time on Windows than it does on Linux. I've even heard of improvements by running Java in a linux VM on Windows.

Take a look at your Task Manager when the process is running, and turn on Show Kernel Times. The CPU time spent in user space can generally be optimized, but the CPU time in kernel space can usually only be reduce by make more efficient calls.

  • Update -

JSR 203 specifically addresses the need for asynchronous, multiplexed, scatter/gather file IO:

The multiplexed, non-blocking facility introduced by JSR-51 solved much of that problem for network sockets, but it did not do so for filesystem operations.

Until JSR-203 becomes part of Java, you can get true asynchronous IO with the Apache MINA project on Linux.

Java NIO (1) allows you to do Channel based I/O. This is an improvement in performance, but your only doing a buffer of data at a time, and not true async & multiplexed IO.

brianegge
Stu, please show me where in the http://java.sun.com/j2se/1.5.0/docs/api/java/nio/channels/FileChannel.html spec you see something about Asynchronous IO. I didn't say Java can't do DMA on Windows. That's a different thing altogether.
brianegge
+2  A: 

It's a long shot guess, but even if you're using buffered streams make sure you're not writing out a single byte at a time.

The .read(int) and .write(int) methods are CPU killers. You should be using .read(byte[]...) and .write(byte[], int, int) for certain.

Ry4an
+6  A: 

Don't guess: profile your application.

If the numbers show that a lot of time is spent in / within write calls, then look at ways to do faster I/O. But if most time is spent in formatting stuff for output (e.g. iText rendering), then that's where you need to focus your efforts.

Stephen C
+1 - Profile before guessing what might be the problem.
James Black
A: 

A 1MB file to write is large enough to use a java.nio FileChannel and see large performance improvements over java.io. Rewrite your code, and measure it agaist the old stuff. I predict a 2x improvement, at a minimum.

Stu Thompson