tags:

views:

101

answers:

4

On my web page, when the user types in a URL in the text field, I wan to get some information about that page, like title or link information.

Is there way to do it? On the client (JavaScript) or on the server (PHP)? And how?

+2  A: 

On the server:

need simpledom

 include("simpledom.php");
 $html = file_get_html('http://www.google.com/');
 echo $html->find('head')->outertext; // returns <head>...</head>
Byron Whitlock
If you want the same functionality without a 3rd party library, the SimpleXML and DomXML libraries included in PHP both support xpath queries. The syntax is a bit more verbose, but it's bound to perform better, and will keep your set of installed libraries smaller.
Frank Farmer
good luck getting malformed html to work with those ;)
Byron Whitlock
+3  A: 

You can't do it through Javascript, unless the page is in your domain. This is because cross-server scripting is restricted.

But you can use PHP (check file_get_contents() function), parse the content of the <head> tag with simpledom and then pass it to an ajax request.

rogeriopvl
+1 for cross site server limitation.
Byron Whitlock
A: 

In php you can open URLs like files, i.e.

$f = fopen ("http://www.site/page.htm", r);

If you want to actually use a real DOM then use simpledom or another module.

Edit: You can probably ignore the fopen() suggestion above, for some reason I was thinking you were asking only about reading sites that you had full control of.

phoebus
most webhosts block allow_url_fopen these days. It will be disabled altogether in PHP6. A better option would be cURL if you need your application to support PHP6 or your webhost has disabled allow_url_fopen.
Mark
My initial assumption was that he was reading a site that he controlled, which upon reflection was a poor one.cURL is certainly an option although I think I'd still probably tend toward a simpledom-stype module.
phoebus
A: 

So, you page have an input text box, and when your user types in a link you want to retrieve information about it?

This might be useful:

http://www.bin-co.com/php/scripts/load/

According to that page that would return something like this:

Array
(
    [headers] => Array
        (
            [Date] => Mon, 18 Jun 2007 13:56:22 GMT
            [Server] => Apache/2.0.54 (Unix) PHP/4.4.7 mod_ssl/2.0.54 OpenSSL/0.9.7e mod_fastcgi/2.4.2 DAV/2 SVN/1.4.2
            [X-Powered-By] => PHP/5.2.2
            [Expires] => Thu, 19 Nov 1981 08:52:00 GMT
            [Cache-Control] => no-store, no-cache, must-revalidate, post-check=0, pre-check=0
            [Pragma] => no-cache
            [Set-Cookie] => PHPSESSID=85g9n1i320ao08kp5tmmneohm1; path=/
            [Last-Modified] => Tue, 30 Nov 1999 00:00:00 GMT
            [Vary] => Accept-Encoding
            [Transfer-Encoding] => chunked
            [Content-Type] => text/xml
        )
    [body] => ... Contents of the Page ...
    [info] => Array
        (
            [url] => http://www.bin-co.com/rss.xml.php?section=2
            [content_type] => text/xml
            [http_code] => 200
            [header_size] => 501
            [request_size] => 146
            [filetime] => -1
            [ssl_verify_result] => 0
            [redirect_count] => 0
            [total_time] => 1.113792
            [namelookup_time] => 0.180019
            [connect_time] => 0.467973
            [pretransfer_time] => 0.468035
            [size_upload] => 0
            [size_download] => 2274
            [speed_download] => 2041
            [speed_upload] => 0
            [download_content_length] => 0
            [upload_content_length] => 0
            [starttransfer_time] => 0.826031
            [redirect_time] => 0
        )

)

OscarRyz
he wants the <head> section, not the headers.
Byron Whitlock