tags:

views:

104

answers:

4

I want to fetch the title of a webpage which I open using urllib2. What is the best way to do this, to parse the html and find what I need (for now only the -tag but might need more in the future).

Is there a good parsing lib for this purpose?

+1  A: 

Here You will find some libs for html/xml parsing. Choice depends on what You need and what fits Your needs.

http://blog.ianbicking.org/2008/03/30/python-html-parser-performance/

Rafal Ziolkowski
A: 

Use Beautiful Soup.

html = urllib2.urlopen("...").read()
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(html)
print soup.title.string
orip
+1  A: 

Try Beautiful Soup:

url = 'http://www.example.com'
response = urllib2.urlopen(url)
html = response.read()

soup = BeautifulSoup(html)
title = soup.html.head.title
print title.contents
Dominic Rodger
+5  A: 

Yes I would recommend BeautifulSoup

If you're getting the title it's simply:

soup = BeautifulSoup(html)
myTitle = soup.html.head.title

or

myTitle = soup('title')

Taken from the documentation

It's very robust and will parse the html no matter how messy it is.

RobbR