ansaurus

Question

Java regex confusion

Answer 1

+1 A:

This:

replaceAll("\\[\\]"," ")

Should probably be:

replaceAll("(\\[|\\])"," ")

You were trying to replace instances of [] with a , instead of replacing a [ or a ] with a .

jjnguy 2010-07-28 19:54:14

Thanks. That explains why the second two replaceAll expressions didn't work as expected, but what about the first? `[` and `]` are not in the set `a-zA-z_\\x2D` correct?

Doug 2010-07-28 20:01:32

[ and ] are in the set A-z, see my answer :)

Affe 2010-07-28 20:06:10

Answer 2

+2 A:

Your first try didn't work because of this

replaceAll("[^a-zA-z_\x2D]+", " ")

That range of characters happens to actually include [ and ] in western european/north american sets. [\]^`_ are placed between Z and a, which is normally a convenience when you write A-z, but also a pitfall for you!

You probably meant A-Z

Affe 2010-07-28 19:59:50

Ahh, the subtle capitalization typo. Sometimes you just need a second pair of regex comprehending eyes. Thanks.

Doug 2010-07-28 21:07:26

Answer 3

A:

It looks like there is a better way to do what you really seem to be wanting to do (removing all non-word characters from the string (except hyphen)):

String[] tokens = s.replaceAll("[^\\w\\s-]+", "").replaceAll("\\s+", " ").trim().split(" ");

This will leave digits in your string alone, though. Is that a problem?

Tim Pietzcker 2010-07-28 20:04:02

ansaurus

tags:

views:

answers:

Java regex confusion

related questions