views:

153

answers:

3

The following java code does exactly what is expected:

1      String s = "♪♬♪♪♬♪♪♬♪♪♬♪♪♬♪♪♬♪";
2      for(int i=0; i < s.length(); i++)
3      {
4         System.out.print(s.substring(i,i+1));
5         //System.out.print("\r");
6         Thread.currentThread().sleep(500);
7      }

But when I try to add carriage return by commenting in line 5 it goes printing ?s. Why is it and how will I fix it?

(I also tried with "\u240d" for carriage return - same thing).

EDIT: The output goes to a bash on Mac OS X.

+4  A: 

please also print s.length(), i bet it is more than 18. the java string representation is utf-16, String.substring just extracts the char values. the musical notes start at 0x1d000 - they don´t fit in a single char. to extract complete codepoints/glyphs from a string use somthing like icu project - UCharacterIterator

ps: i don´t know if your terminal session can display those chars at all

sascha
Assuming the characters pasted into Firefox are the same in the app, they are U+266A and U+266C, both in the basic multilingual plane.
McDowell
+3  A: 

I expect it is due to how your terminal is interpreting the output.

As has been pointed out above, all of the note glyphs are multibyte characters. Additionally, Java chars are just 16 bits wide, so a single char cannot reliably represent a single Unicode character on its own - and subsequently the String.substring method isn't wholly multibyte friendly.

Thus what is likely happening is that on each iteration through the loop, Java prints out half a character, as it were. When the first byte of a pair is printed out, the terminal realises it's the first half of a multibyte character and doesn't display it. When the next byte is printed, the terminal sees the full character corresponding to the note and displays it.

What happens when you uncomment the println("\r"), is that you're inserting a newline in the middle of the two halves of each character. Thus the terminal never gets the byte sequence e.g. 0x26, 0x6C representing the note but instead gets 0x26, 0x10, 0x6C, 0x10 so the note is not rendered.

Andrzej Doyle
This is wrong. Java chars are 16-bit values.
Jason Orendorff
So it is. I guess I just remembered it was narrower than `int`, would hold an ASCII character but not many exotic Unicode characters, and then just failed to think/check. Thanks for the somewhat embarassing correction!
Andrzej Doyle
+1  A: 

Java doesn't know that your source file is UTF-8.

If you compile with

javac -encoding utf8 MyClass.java

and run with

java -Dfile.encoding=utf8 MyClass

it will work.

(Does anyone know why UTF-8 isn't the default?)

Jason Orendorff
Thank's also to the other answers from dtsazza and sascha. Even though they were (mostly) right and made it possible to program a workaround, Jason gets the point for a simple solution without code changes.
Kai