tags:

views:

89

answers:

6

if you go here:

http://whois.domaintools.com/iconplc.com

and view the source

why can't you see the registrant data in the HTML source?

is it at all possible to get this data through the html source?

this stuff is not in the html source:

Registrant:
ICON Clinical Research
   212 Church Road
   North Wales, PA 19454
   US

   Domain Name: ICONPLC.COM

   Administrative Contact, Technical Contact:
      ICON Clinical Research                
      212 Church Road
      North Wales, PA 19454
      US
      215-616-3359 fax: 123 123 1234

   Record expires on 08-Sep-2019.
   Record created on 12-Dec-2007.

   Domain servers in listed order:

   UDNS1.ULTRADNS.NET           
   UDNS2.ULTRADNS.NET

even after i save the webpage as .html, i am still unable to find the email address

+1  A: 

If you look at the source, they have linked to an ajax application. My guess would be that they are pulling it down after the HTML has loaded, and so the information won't be viewable by looking at the source.

Here is a link talking about how to scrape ajax sites:

http://stackoverflow.com/questions/260540/how-do-you-screen-scrape-ajax-pages

Kevin
hank you. is there a way to programmatically get it?
I__
I'm not sure if there's a way. I'm guessing they're trying to stop people from programmatically getting it.
Greg
There's always a way. If you can view it on a page, you can scrape it through a program.
Kevin
+1  A: 

Looks like the page is put together with AJAX. Firebug in Firefox, or Developer tools in IE should help you get to it.

graphicdivine
thank you. is there a way to programmatically get it?
I__
+1  A: 

Because it is generated with JavaScript. Grep the source for whois_data

David Dorward
how do i do that?
I__
You're possibly looking for a whois api then: http://stackoverflow.com/questions/36817/who-provides-a-whois-api
graphicdivine
"grep", in this context, means "Look through the source for…"
David Dorward
+1  A: 

i have chrome browser and it shows the content you want but not in the same format like this:

ajaxUpdate("3","Registrant:
ICON Clinical Research
   212 Church Road
   North Wales, PA 19454
   US

   Domain Name: ICONPLC.COM

   Administrative Contact, Technical Contact:
      ICON Clinical Research                
      212 Church Road
      North Wales, PA 19454
      US
      215-616-3359 fax: 123 123 1234

   Record expires on 08-Sep-2019.
   Record created on 12-Dec-2007.

   Domain servers in listed order:

   UDNS1.ULTRADNS.NET           
   UDNS2.ULTRADNS.NET")
GK
+1  A: 

I just looked at the source and the text you mention is there, with the only mention that it has  s instead of spaces.

<div class=\'whois_record\'>Registrant:<br/>ICON&nbsp;Clinical&nbsp;Research<br/>&nbsp;&nbsp;&nbsp;212&nbsp;Church&nbsp;Road<br/>&nbsp;&nbsp;&nbsp;North&nbsp;Wales,&nbsp;PA&nbsp;19454<br/>&nbsp;&nbsp;&nbsp;US<br/><br/>&nbsp;&nbsp;&nbsp;Domain&nbsp;Name:&nbsp;ICONPLC.COM<br/><br/>&nbsp;&nbsp;&nbsp;Administrative&nbsp;Contact,&nbsp;Technical&nbsp;Contact:<br/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;ICON&nbsp;Clinical&nbsp;Research&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; etc.

Also, as already mentioned, extra text can always be added to a page at a later time by client-side scripts.

Tiberiu Ana
+1  A: 

You can use the Selenium C# Client driver to write code that checks for this css locator css=div.whois_record . You can then write code to scrape every
under that particular div. The email address found on the page is an image so you would have to save it.

retornam