tags:

views:

112

answers:

4

When you share something on Facebook or Digg, it generates some summary of the page. How would I do this in Perl? What algorithms are there?

For example:

If I go to Facebook and tried to share this question as a link: http://stackoverflow.com/questions/1279851/facebook-digg-get-website-summary

It retrieves "Facebook/Digg get website summary? - Stack Overflow" as the title (which is just the title of the page) and [... incomplete question?]

+1  A: 

Basically you want to scrape the URL and find the "most significant paragraph" which might be the first <div> or <p> element after the first <h2> or <h1>, depending on the layout of the page.

xkcd150
+1  A: 

You could check and see if there is a meta description on the page, but that leaves you at the mercy of whoever wrote the meta description.

Bryan Denny
I tried to do this, but most of the articles do not contain them.
Timmy
+2  A: 

Assuming you mean sharing a link...

Usually the summary is written by the user submitting the URL. If you have to write a summary automagically this can be achieved by:

  • Using the first 100 or so characters of the document body (in itself not easy)
  • Using metadata like the description or keywords (often empty or spammed)
  • Context-relevant summaries like recreating Google snippets (sorry its PHP but simple)
  • Tags/keywords from the document using something like the Yahoo Keyword Extractor API or your own keyword density function

Your best bet is to ask the user!

Hope that helps somewhat :)

Al
+3  A: 

CPAN is your friend.

Some promising looking modules:

pimlottc