views:

45

answers:

3

Hey guys,

How can I, in PHP, get a summary of any URL? By summary, I mean something similar to the URL descriptions in Google web search results.

Is this possible? Is there already some kind of tool I can plug in to so I don't have to generate my own summaries?

I don't want to use metadata descriptions if possible.

-Dylan

+2  A: 

What displays in Google is (generally) the META description tag. If you don't want to use that, you could use the page title instead though.

Eric Petroelje
A: 

If you don't want to use metadata descriptions (btw, this is exactly what they are for), you have a lot of research and work to do. Essentially, you have to guess which part of the page is content and which is just navigation/fluff. Indeed, Google has exactly that; note however, that extracting valuable information from useless fluff is their #1 competency and they've been researching and improving that for a decade.

You can, of course, make an educated guess (e.g. "look for an element with ID or class maincontent" and get the first paragraph from it) and maybe it will be OK. The real question is, how good do you want the results to be? (Facebook has something similar for linking to websites, sometimes the summary just insists that an ad is the main content).

Piskvor
Okayy.. Maybe I'll stick with metadata. Could you give me an efficient way to get the title and description? I'm stuck..
Dylan Taylor
Sure; however I think it's a different enough subject to warrant its own question - e.g. "Using PHP, how to parse the title and meta tags from a HTML page?" might be a good title. (I'm assuming you know how to download the page through your PHP script :))
Piskvor
yes i do. Thanks :)
Dylan Taylor
A: 

The following will allow you to to parse the contents of a page's title tag. Note: php must be configured to allow file_get_contents to retrieve URLs. Otherwise you'll have to use curl to retrieve the page HTML.

$title_open = '<title>';
$title_close = '</title>';

$page = file_get_contents( 'http://www.domain.com' );
$n = stripos( $page, $title_open ) + strlen( $title_open );
$m = stripos( $page, $title_close);

$title = substr( $page, n, m-n );
johnny_bgoode