String a ="the STRING TOKENIZER CLASS ALLOWS an APPLICATION to BREAK a STRING into TOKENS. ";
StringTokenizer st = new StringTokenizer(a);
while (st.hasMoreTokens()){
System.out.println(st.nextToken());
Given above codes, the output is following,
the
STRING TOKENIZER CLASS
ALLOWS
an
APPLICATION
to
BREAK
a
STRING
into
TOKENS.
My only question is why the "STRING TOKENIZER CLASS" has been combined into one token????????
When I try to run this code,
System.out.println("STRING TOKENIZER CLASS".contains(" "));
It printed funny result,
FALSE
It sound not logical right? I've no idea what went wrong.
I found out the reason, the space was not recognized as valid space by Java somehow. But, I don't know how it turned up to be like that from the front processing up to the code that I've posted.
Guys, I need to highlight that, below code runs first before the above one..
if (!suspectedContentCollector.isEmpty()){ Iterator i = suspectedContentCollector.iterator(); String temp=""; while (i.hasNext()){ temp+=i.next().toLowerCase()+ " "; } StringTokenizer st = new StringTokenizer(temp);
while (st.hasMoreTokens()){
temp=st.nextToken();
temp=StopWordsRemover.remove(temp);
analyzedSentence = analyzedSentence.replace(temp,temp.toUpperCase());
}
}
Hence, once it has been changed to UPPERCASE, something seems to went wrong somewhere and I realized only certain spaces were not recognized. Could it be the reason of retrieving the text from the document?
Following code,
String a ="the STRING TOKENIZER CLASS ALLOWS an APPLICATION to BREAK a STRING into TOKENS. "; for (int i : a.toCharArray()) { System.out.print(i + " "); }
produced following output,
116 104 101 32 83 84 82 73 78 71 160 84 79 75 69 78 73 90 69 82 160 67 76 65 83 83 32 65 76 76 79 87 83 32 97 110 32 65 80 80 76 73 67 65 84 73 79 78 32 116 111 32 66 82 69 65 75 32 97 32 83 84 82 73 78 71 32 105 110 116 111 32 84 79 75 69 78 83 46 160 32