views:

366

answers:

5

Hi,

I am trying to pass a byte[] containing ASCII characters to log4j, to be logged into a file using the obvious representation. When I simply pass in the byt[] it is of course treated as an object and the logs are pretty useless. When I try to convert them to strings using new String(byte[] data), the performance of my application is halved.

How can I efficiently pass them in, without incurring the approximately 30us time penalty of converting them to strings.

Also, why does it take so long to convert them?

Thanks.

Edit

I should add that I am optmising for latency here - and yes, 30us does make a difference! Also, these arrays vary from ~100 all the way up to a few thousand bytes.

+1  A: 

Take a look here: Faster new String(bytes, cs/csn) and String.getBytes(cs/csn)

Rubens Farias
+3  A: 

What you want to do is delay processing of the byte[] array until log4j decides that it actually wants to log the message. This way you could log it at DEBUG level, for example, while testing and then disable it during production. For example, you could:

final byte[] myArray = ...;
Logger.getLogger(MyClass.class).debug(new Object() {
    @Override public String toString() {
        return new String(myArray);
    }
});

Now you don't pay the speed penalty unless you actually log the data, because the toString method isn't called until log4j decides it'll actually log the message!

Now I'm not sure what you mean by "the obvious representation" so I've assumed that you mean convert to a String by reinterpreting the bytes as the default character encoding. Now if you are dealing with binary data, this is obviously worthless. In that case I'd suggest using Arrays.toString(byte[]) to create a formatted string along the lines of

[54, 23, 65, ...]
Steven Schlansker
Nice, using an asynchronous logger this moves the conversion away from the critical path.
jwoolard
+2  A: 

ASCII is one of the few encodings that can be converted to/from UTF16 with no arithmetic or table lookups so it's possible to convert manually:

String convert(byte[] data) {
    StringBuilder sb = new StringBuilder(data.length);
    for (int i = 0; i < data.length; ++ i) {
        if (data[i] < 0) throw new IllegalArgumentException();
        sb.append((char) data[i]);
    }
    return sb.toString();
}

But make sure it really is ASCII, or you'll end up with garbage.

finnw
Thanks - this brought it down by about 60%...
jwoolard
+1  A: 

If your data is in fact ASCII (i.e. 7-bit data), then you should be using new String(data, "US-ASCII") instead of depending on the platform default encoding. This may be faster than trying to interpret it as your platform default encoding (which could be UTF-8, which requires more introspection).

You could also speed this up by avoiding the Charset-Lookup hit each time, by caching the Charset instance and calling new String(data, charset) instead.

Having said that: it's been a very, very long time since I've seen real ASCII data in production environment

Joachim Sauer
+1  A: 

Halved performance? How large is this byte array? If it's for example 1MB, then there are certainly more factors to take into account than just "converting" from bytes to chars (which is supposed to be fast enough though). Writing 1MB of data instead of "just" 100bytes (which the byte[].toString() may generate) to a log file is obviously going to take some time. The disk file system is not as fast as RAM memory.

You'll need to change the string representation of the byte array. Maybe with some more sensitive information, e.g. the name associated with it (filename?), its length and so on. After all, what does that byte array actually represent?

Edit: I can't remember to have seen the "approximately 30us" phrase in your question, maybe you edited it in within 5 minutes after asking, but this is actually microoptimization and it should certainly not cause "halved performance" in general. Unless you write them a million times per second (still then, why would you want to do that? aren't you overusing the phenomenon "logging"?).

BalusC
These arrays vary hugely, from about 150 bytes all the way up to 4000 bytes. re. your last point, I am optimizing for latency rather than throughput - so I either need to move this conversion away from the critical path, or speed it up...
jwoolard
Also, there sadly is a requirement to log all this data - and yes, it is a LOT of data...
jwoolard
Then your bottleneck is more in the disk IO than in the Java code --as I expected.
BalusC