Hello
I'm using a Java StreamTokenizer to extract the various words and numbers of a String but have run into a problem where numbers which include commas are concerned, e.g. 10,567 is being read as 10.0 and ,567.
I also need to remove all non-numeric characters from numbers where they might occur, e.g. $678.00 should be 678.00 or -87 should be 87.
I believe these can be achieved via the whiteSpace and wordChars methods but does anyone have any idea how to do it?
The basic streamTokenizer code at present is:
BufferedReader br = new BufferedReader(new StringReader(text));
StreamTokenizer st = new StreamTokenizer(br);
st.parseNumbers();
st.wordChars(44, 46); // ASCII comma, - , dot.
st.wordChars(48, 57); // ASCII 0 - 9.
st.wordChars(65, 90); // ASCII upper case A - Z.
st.wordChars(97, 122); // ASCII lower case a - z.
while (st.nextToken() != StreamTokenizer.TT_EOF) {
if (st.ttype == StreamTokenizer.TT_WORD) {
System.out.println("String: " + st.sval);
}
else if (st.ttype == StreamTokenizer.TT_NUMBER) {
System.out.println("Number: " + st.nval);
}
}
br.close();
Or could someone suggest a REGEXP to achieve this? I'm not sure if REGEXP is useful here given that any parding would take place after the tokens are read from the string.
Thanks
Mr Morgan.