Is there a library that specializes in parsing such data?
You could use something like Google Maps. Geocode the address and, if successful, Google's API will return an XML representation of the address with all of the elements separated (and corrected or completed).
EDIT:
I'm being voted down and not sure why. Parsing addresses can be a little difficult. Here's an example of using Google to do this:
I'm not saying this is the only way or necessarily the best way. Just a way to parse addresses on a web site.
This is not really an answer but pointers to options that can be useful.
Dive into Python has some info on address parsing. http://www.diveintopython.org/regular_expressions/street_addresses.html
Similar questions on SO:
http://stackoverflow.com/questions/518210/where-is-a-good-address-parser
http://stackoverflow.com/questions/16413/parse-usable-street-address-city-state-zip-from-a-string
There are 2 parts to this: extract the complete address from the page, and parse that address into something you can use (store the various parts in a DB for example).
For the first part you will need a heuristic, most likely country-dependant: for US addresses [A-Z][A-Z],?\s*\d\d\d\d\d
should give you the end of an address, provided the 2 letters turn out to be a state. Finding the beginning of the string is left as an exercise.
The second part can be done either through a call to Google maps, or as usual in Perl, using a CPAN module: Lingua::EN::AddressParse (test it on your data to see if it works well enough for you).
In any case this is a difficult task, and you will most likely never get it 100% right, so plan for manually checking the addresses before using them.
Here is a link to my street address parser, written in Python using pyparsing.
You don't need regular expressions (yet) or a general parser like pyparsing (at all). Look at something like Beautiful Soup, which will parse even bad HTML into something like a tree of tags. From there, you can look at the source of the page, and find out what tags to drill down through to get to the data. Then, from Beautiful Soup's tree, you can search for these nodes using XPath (in recent versions), and directly loop over the tags you're interested in, getting to the actual data easily. From there, you can parse the data using a quick regex or something. This will be more flexible and more future proof, and also possibly less head-exploding, than just trying to do it in pure regular expressions.