views:

55

answers:

2

hello! I need to extract the zipcode from file's line. each line contains an adress and is formatted in a different way. eg. "Großen Haag 5c, DE-47559 Kranenburg" or "Lange Ruthe 7b, 55294 Bodenheim"

the zipcode is always a five digit number and sometimes follows "DE-". I use Java. Thanks a lot!

+3  A: 
\b\d{5}\b

will match 5 digits if they are "on their own", i.e. surrounded by word boundaries (to ensure we're not matching substrings of a longer sequence of numbers, although those will probably be rare in an address file).

Remember that you'll need to escape the backslashes in a Java string ("\\b\\d{5}\\b").

Tim Pietzcker
thanks! but they are not always "alone". sometimes I have "DE-12345"
tzippy
Yes, and this works since the position between the `-` and the number counts as a word boundary. It just won't match a number like `123456` because it contains more than five digits.
Tim Pietzcker
Ah okay, didn't get the maning of word boundaries. Thanks!Another question concerning Java:matcher.matches() gives me a boolean but how do I return the zipcode itsself?
tzippy
I think you shouldn't rely on the word boundary if the data is entered by users - maybe someone enters D12345. Five digits won't appear anywhere else, so it's a good bet that it's the zip code anyway. That is, unless there are house numbers with five digits, of course, but I'm not aware of any :)
OregonGhost
we don't have five digit house numbers in germany =)
tzippy
@tzippy: `matcher.group()` should give you the match result. And of course you can drop the `\b` if you don't need them.
Tim Pietzcker
Yeah, thanks, just found that one out myself.But there#s a big flaw. A leading zero is recognized as word boundary and then cut off!But leading zeros are part of the zipcode!
tzippy
Oh Sorry, my mistake. Shouldn't convert to int if I want to keep the leading zero =)
tzippy
@tzippy For your last mistake, see also http://thedailywtf.com/Comments/Jan48.aspx#188154 "Thank you for invalidating my input before validating it." ;-)
Christian Semrau
+1  A: 

Pattern.matcher("[0-9]{5}")

Daniel