What's the best way to get a description of the website, in Python?

views:

answers:

+6 Q:

What's the best way to get a description of the website, in Python?

Suppose I downloaded the HTML code, and I can parse it. How do I get the "best" description of that website, if that website does not have meta-description tag?

+3 A:

You could get the first few sentence returned from something like Readability.

Safari 5 uses it, so it must be alright :)

alex 2010-07-26 06:03:43

+1 for Readability. Neat tool.

Nick Presta 2010-07-26 06:08:45

+1 for Readability :)

Kit 2010-07-26 06:12:37

+1 A:

It's very hard to come up with a rule that works 100% of the time, obviously, but my suggestion as a starting point would be to look for the first <h1> tag (or <h2>, <h3>, etc - the highest one you can find) then the bit of text after that can be used as the description. As long as the site is semantically marked-up, that should give you a good description (I guess you could also take the contents of the <h1> itself, but that's more like the "title").

It's interesting to note that Google (for example) uses a keyword-specific extract of the page contents to display as the description, rather than a static description. Not sure if that'll work for your situation, though.

Dean Harding 2010-07-26 06:06:25

+1 A:

To follow up on the "Readability" suggestion above (which itself is inspired by the website InstaPaper), they have release the JavaScript: http://code.google.com/p/arc90labs-readability/. What's more, some guy took that and ported it to python: http://github.com/gfxmonk/python-readability. Rejoice!

loevborg 2010-07-26 15:05:01

ansaurus

tags:

views:

answers:

What's the best way to get a description of the website, in Python?

related questions