ansaurus

Question

Fast alternative to java.nio.charset.Charset.decode(..)/encode(..)

Answer 1

+2 A:

The javadoc for encode() and decode() make it clear that these are convenience methods. For example, for encode():

Convenience method that encodes Unicode characters into bytes in this charset.

An invocation of this method upon a charset cs returns the same result as the expression

 cs.newEncoder()
   .onMalformedInput(CodingErrorAction.REPLACE)
   .onUnmappableCharacter(CodingErrorAction.REPLACE)
   .encode(bb);

except that it is potentially more efficient because it can cache encoders between successive invocations.

The language is a bit vague there, but you might get a performance boost by not using these convenience methods. Create and configure the encoder once, and then re-use it:

 CharsetEncoder encoder = cs.newEncoder()
   .onMalformedInput(CodingErrorAction.REPLACE)
   .onUnmappableCharacter(CodingErrorAction.REPLACE);

 encoder.encode(...);
 encoder.encode(...);
 encoder.encode(...);
 encoder.encode(...);

It always pays to read the javadoc, even if you think you already know the answer.

skaffman 2010-01-20 00:04:07

In Java 1.6 (at least) the implementation of `CharSet.encode(...)` uses an encoder that is cached using thread locals, and repeats the setup calls (`onMalformed ...` etc) each time. By doing your own caching, you would only save the overhead of a thread local fetch, and the setup calls. This is probably insignificant ... though the profiler should tell you that.

Stephen C 2010-01-20 01:17:32

Fair point. There is a multi-threaded use case here, though.

skaffman 2010-01-20 08:13:26

Actually, I've read the javadoc ;)`.

Franz See 2010-01-21 02:34:30

Answer 2

A:

There are very few reasons to "squeeze" a string in a byte array. I would recommend to write the C functions to take utf-16 strings as parameters. This way there is no need for any conversion.

Mihai Nita 2010-01-21 09:42:29

Ok, I will try that one.

Franz See 2010-01-21 11:55:30

ansaurus

tags:

views:

answers:

Fast alternative to java.nio.charset.Charset.decode(..)/encode(..)

related questions