ansaurus

Question

Answer 1

+3 A:

Is it possible that you're using something other than normal ascii blanks in "STRING TOKENIZER CLASS"? Maybe you held down the shift key and got a shifted-space in there instead?

Paul Tomblin 2010-03-22 18:15:16

I was thinking as same as you..But, the original String was all in lowercase..and I changed some of the words to uppercase..After changing that part, some of the spaces seems to be undetected which is a very weird case to me..Any idea why??

Mr CooL 2010-03-22 18:17:31

Did you change them to uppercase by hitting "caps lock" or by holding down the "shift" key as you typed? If the latter, Paul's point would seem right.

Jim Kiley 2010-03-22 18:19:11

Answer 2

+6 A:

There -- the answer is in the snippet that you added. The integers listed show that the space after the word STRING is ASCII character 160, which is  , instead of character 32, which is the ordinary space. Edit your original string, replacing the spaces within STRING TOKENIZER CLASS with actual spaces instead of shift-spaces.

Just a side comment, from the 1.4.2 Javadoc:

StringTokenizer is a legacy class that is retained for compatibility reasons although its use is discouraged in new code. It is recommended that anyone seeking this functionality use the split method of String or the java.util.regex package instead.

Jim Kiley 2010-03-22 18:15:59

It's the same....the space was not recognized...

Mr CooL 2010-03-22 18:35:21

Thanks Jim Kiley

Mr CooL 2010-03-22 19:48:51

Answer 3

+1 A:

If you copy/pasted the sentence from a web page or a Word document, chances are you got some special characters instead of spaces (ex: non-breaking spaces, etc.). Try again by typing the sentence in your Java editor.

Olivier Croisier 2010-03-22 18:16:10

Yeah....If I type it, it has no problem, however, if through some processing only, it has this problem....

Mr CooL 2010-03-22 18:22:48

Answer 4

+2 A:

Do us all a favor and copy and paste the output of this snippet:

    for (int i : a.toCharArray()) {
        System.out.print(i + " ");
    }

OK, now looking at the output, it confirms what we've all been suspecting: those "spaces" are ASCII 160, the &nbsp non-breaking space. It's a different character from the ASCII 32 regular space.

You can let the tokenizer (which is obsolete as others have said) to include ASCII 160 as delimiter, or you can filter it out from the input string if it's not supposed to be there in the first place.

For now, a = a.replace((char) 160, (char) 32); before tokenizing is a quick-fix.

polygenelubricants 2010-03-22 18:24:05

Okay...thanks,,,

Mr CooL 2010-03-22 18:40:11

Sorry polygenelubricants,How to actually replace with the ASCII 160 to ASCII 32 regular space?because the code pasted by you, a = a.replace(160, 32); didn't work.

Mr CooL 2010-03-22 18:58:35

Sorry, I forgot to add the cast `(char)`.

polygenelubricants 2010-03-22 19:14:17

Thanks polygenelubricants~! ;)

Mr CooL 2010-03-22 19:59:32

Answer 5

+3 A:

Looking at the character codes, the 'space' in question is 0xA0, which is intended to be a non-breaking space. My guess is that it was entered deliberately so that 'STRING TOKENIZER CLASS' is treated as one word.

The solution (if you indeed deem it correct to break up 'STRING TOKENIZER CLASS' into three words) would be to pass add the non-breaking space as delimiter to the StringTokenizer class (resp. the String.split() method). E.g.

  new StringTokenizer(string, " \t\n\r\f\240")

Lars 2010-03-22 19:02:58

Thanks man....the code works to remove the funny space!

Mr CooL 2010-03-22 19:09:54

ansaurus

tags:

views:

answers:

StringTokenizer problem of tokenizing

related questions