tags:

views:

56

answers:

3

I'm trying to strip out the street number from a mailing address.

I have a regex in Java:

address.replace("^\\s*[0-9]+\\s+","");

It works on this address:

301 West 23rd Street

making it:

West 23rd Street

But when I apply it to this address, the address is unchanged:

70-50 69th Place

Instead it needs to be:

69th Place

Any ideas?

+1  A: 

That regex will only strip out the first group of digits it encounters. It's also having trouble with the -. If you want to strip out every group of digits, including -s, do something like this:

address.replace("^\\s*([0-9-]+\\s+)+","");
Swordgleam
+1  A: 

Your regex says to find: whitespace, digits, whitespace, and then replace them with nothing.

Your "bad" string doesn't have whitespace, digits, whitespace, it has whitespace, digits, dash.

If you want to include the dash in the street number, try this: "^\\s*[0-9-]+\\s+"

Ned Batchelder
+4  A: 

Your regular expression doesn't match that string. Here is an explanation of the regular expression

^      Start of string. Matches successfully.
\\s*   Zero or more whitespace. Matches the empty string.
[0-9]+ One or more digits. Matches "70".
\\s+   One or more whitespace. Fails to match.

The character after "70" is a hyphen and a hyphen is not a whitespace character so the match fails and no replacement is made. To fix it you can put a hyphen in the character class:

address = address.replace("^\\s*[0-9-]+\\s+", "");

When the hyphen is in a character class it has a special meaning (a range of characters), except in two cases:

  • when it is at the beginning or the end of the character class
  • when it is escaped with a backslash (but note that two backslashes are required in a Java string literal).
Mark Byers