So I know about String#codePointAt(int)
, but it's indexed by the char
offset, not by the codepoint offset.
I'm thinking about trying something like:
- using
String#charAt(int)
to get thechar
at an index - testing whether the
char
is in the high-surrogates range- if so, use
String#codePointAt(int)
to get the codepoint, and increment the index by 2 - if not, use the given
char
value as the codepoint, and increment the index by 1
- if so, use
But my concerns are
- I'm not sure whether codepoints which are naturally in the high-surrogates range will be stored as two
char
values or one - this seems like an awful expensive way to iterate through characters
- someone must have come up with something better.