What's the most efficient way to calculate the byte length of a character, taking the character encoding into account? The encoding would be only known during runtime. In UTF-8 for example the characters have a variable byte length, so each character needs to be determined individually. As far now I've come up with this:
char c = getCharSomehow();
String encoding = getEncodingSomehow();
// ...
int length = new String(new char[] { c }).getBytes(encoding).length;
But this is clumsy and inefficient in a loop since a new String
needs to be created everytime. I can't find other and more efficient ways in the Java API. There's a String#valueOf(char)
, but according its source it does basically the same as above. I imagine that this can be done with bitwise operations like bit shifting, but that's my weak point and I'm unsure how to take the encoding into account here :)
If you question the need for this, check this topic.
Update: the answer from @Bkkbrad is technically the most efficient:
char c = getCharSomehow();
String encoding = getEncodingSomehow();
CharsetEncoder encoder = Charset.forName(encoding).newEncoder();
// ...
int length = encoder.encode(CharBuffer.wrap(new char[] { c })).limit();
However as @Stephen C pointed out, there are more problems with this. There may for example be combined/surrogate characters which needs to be taken into account as well. But that's another problem which needs to be solved in the step before this step.