tags:

views:

134

answers:

6

I want to capture addresses given on "contact us" pages. Is there any PHP script to do so?

I am struck because of it. My client wants to store adresses of the web sites given on "contact us" pages. I am able to get content from the page. But I am quite confused how to get only the address from this page.

For example: www.abc.com/contactus
contains

Office Address:

  • X Road,
  • X City
  • pin xxxxx
  • X Country

How can i get this address?

A: 

you will have to proably use String manipulation techniques to extract the string you require out of the entire content of page.

So when you say you are able to getcontent from contact us page. Store this content in a string look for patterns across this string to see how address is organized and cut that string using some of the string methods available in php.

sushil bharwani
+16  A: 

If you want my 2 cents: Forget it.

Parsing addresses out of arbitrary patterns is a hugely complex task.

A team of top-notch experts in computer algorithms and pattern detection may be able to provide you with sensible results - like the kind of teams that probably work on Google Maps - but there is not going to be a ready-made PHP script around that can do this with any useful kind of success rate.

Forget it and concentrate on building an interface that makes storing addresses manually very, very easy.

Pekka
+1  A: 
  1. You can't get the addresses from any (ie all) contact us pages, as every one of them is different - no amount of pattern matching will cover all variants, encodings, data formats etc
  2. Ethically, this is somewhat of a (very) grey area - I'm quite relieved there is no sure fire way to do this.
nathan
+2  A: 

For me it sounds like you want to get those addresses to spam them with some stuff. I think this is why many pages "store" their addresses in an image. So in that case it is impossible.

Coding a script that would extract the address from every page (which is not really possible) whould take more time to copy-paste thousands of addresses manually.

And if those thousand are not enough, it even more looks like you are going to spam them (which is illegal in most of the counries).

Kau-Boy
+3  A: 

It's going to be far, far cheaper to buy a list of addresses of a spam broker. Your client's request is unrealistic.

Rimian
+3  A: 

I'd recommend using Amazon Mechanical Turk for this. For just a few cents (I've seen offerings of $0.02 per 10 sites), you can get people to do this for you. Of course, you'll still need to write the MTurk task, and some cross-result validation code.. but that's a fairly easy task, compared to string-parsing tons of sites.

Just my $0.02

kander