ansaurus

Question

Answer 1

+1 A:

If you feel that your list might be a constant set you can do something silly like:

StringTokenizer(v, " .,?!:;()\b\t\n\f\r\"\'\");

or you could do a search and replace for the character values outisde of 65-90 and 97-122.

tathamr 2010-09-12 17:18:17

Answer 2

+2 A:

Yes, the default delimiters are whitespace characters, but you can specify your own using the two-argument constructor:

StringTokenizer st = new StringTokenizer(recordText, ".,! ()[]");

Alan Moore 2010-09-12 17:22:18

What's the best way to modify StringTokenizer output for only English words that would be needed in a full text search?