views:

45

answers:

2

I print to the standard output some characters from a wide UTF-8 range in a Java application. My console is configured for UTF-8 support. My problem is that sometimes, when I decide to print 10 characters for example, I see a number of character which is less then 10.

I think this is due to the console which interprets some characters. Are there some unicode character which can be interpreted like: erase the previous character ? Is it possible to exclude them from the ouput (what are the codepoints of these characters)?

A: 

Obvious one is backspace

Andrey
+1  A: 

Using carriage return or the backspace character you can get results like you describe. This little test program for instance...

public class Test {
    public static void main(String... args) {
        System.out.println("abc\rdef\u0008g");
    }
}

...prints in my terminal (ubuntu)

$ java Test
deg
$

\r is carriage return, and \u0008 represents the backspace character. (Carriage return sends the cursor back to the first column, and backspace sends it back one column.)


To remove all these, so called "control characters" you could do:

myString = myString.replaceAll("\\p{Cntrl}", "");

from the docs:

\p{Cntrl}      A control character: [\x00-\x1F\x7F]

aioobe
I removed these characters from the ouput with myString.replaceAll("[\r\u0008]", ""). However, I still get some truncated output. I think there are other characters :(
Laurent
Updated my answer.
aioobe
It works. Thanks a lot :)
Laurent