views:

384

answers:

1

I'm doing a simple exercise from a book and I'm a little bit confused with how the java function parseInt works. I have read a line from an input file, used the StringTokenizer to split it and now I want to parse each part as an integer.

I have checked in the watch window that the input of the parseInt function is indeed a string which seems a valid integer (e.g. "35"). However, when I try to use the str.charAt function on my variable str holding the value "35", I get strange results :

str.charAt(0) == ""
str.charAt(1) == "3"
str.charAt(2) == ""
str.charAt(3) == "5"

This seems to be a problem probably somehow related to the encoding, so I have tried to fix it using this way of reading the file :

InputStreamReader reader = new InputStreamReader(new FileInputStream(inputfile), "UTF-8");

(I have explicitly saved the file using UTF-8 encoding in my editor), but this didn't help. Any ideas what could be the problem and how to fix it ?

EDIT : My sample

        InputStreamReader reader = new InputStreamReader(new FileInputStream(inputfile), "UTF-8");
        BufferedReader bfreader = new BufferedReader(reader);

        line = bfreader.readLine();
        while (line !=null)
        {
                String[] valueStrings = line.split(" ");
                String hole = valueStrings[0]; 

                int[] values = new int[4];
                for (int i = 0; i <values.length; i++){

                    String nr = valueStrings[i+1].trim(); 
                    values [i] = Integer.parseInt(nr);
                }

                // it breaks at the parseInt here, the rest is not even executed...

         }
+8  A: 

My guess is that it's actually:

str.charAt(0) == '\0'
str.charAt(1) == '3'
str.charAt(2) == '\0'
str.charAt(3) == '5'

It sounds like it's probably actually saved in UTF-16 rather than UTF-8 - but if your text editor thought it was meant to save "null" characters, that would make sense. Try looking at the text file in a binary hex editor - I suspect you'll find that every other byte is 0.

If that doesn't help, please post a short but complete program which demonstrates the problem - so far we've only seen one line of your code.

Jon Skeet
You're correct actually, I have checked it in hex editor and each second character is zero. Anyway, how can I read from such a file ?
Kate M
Setting the charset to UTF-16 should read it in, or saving the file as either UTF-8 or the system default.
Ninefingers
The file is read in just fine, the problem is that the parseInt fails when it tries to parse the String. Is there some way to fix that ?
Kate M
@KateM: No, if your string has every other character as a Unicode character '\0' then it's *not* reading in just fine.
Jon Skeet