views:

331

answers:

3

Anyone know of a way to programatically detect a parked web page? That is, those pages that you accidentally type in (or intentionally sometimes) and they are hosted by a domain parking service with nothing but ads on them.

I am working on a linking network and want to make sure that sites that expire don't end up getting snatched by someone else and then being a parked page.

Any thoughts or ideas are appreciated.

G-Man

+4  A: 

I would say that you'll have to examine the WHOIS records for the sites in question and/or the actual content of the pages and develop some heuristics as to what constitutes a "parked page".

Take goooogle.com, looking at their WHOIS record shows that they are owned by "Privacy Protection" and that their DNS servers are ns1/ns2.fastpark.net. If you look at the source for the site, they're silly enough to have a CSS file named "style_park.css" :)

All in all, I don't think you'll be able to come up with a generic way to do it. You'll probably end up with some ever evolving rule base or blacklist

Kevin
Might have a better chance with the idea of blacklists and rules...Its very hard to programmically figure out if your looking at data trash, but you could still seek for unique patterns in the files ( common css rules, images, etc ).
David
+3  A: 

Here is a test that I think may catch a decent number of them. It takes advantage of the fact you don't actually want to have real web sites up for your parked domains. It looks for the wildcarding of both subdomain and path. Lets say we have this URL in our system

http://www.example.com/method-to-detect-parked.

First I would check the actual URL and hash it or grab a copy for comparison.

My second check would be to

http://random.example.com/random

If it matches the original link or even succeeds, you have a pretty good indicator that the page is parked. If it fails I might check both the subdomain and path individually. If the page randomly changes some elements, you may want to choose a few items to compare. For example make a list of links included in the page and compare those or maybe the title tag.

Philip T.
+2  A: 

You could just rely on your users to "Report this link"... which would put it into a queue to review later?

BoltBait