You might consider lxml
which has a powerful HTML processor. There is another complementary module that relies on lxml
called pyquery
that might be just what you're looking for.
PyQuery has jQuery-like syntax, so if you're used to jQuery you'll be able to jump right in.
Here is a simple example to get the first <ul>
item from aol.com:
>>> from pyquery import PyQuery as pq
>>> import urllib
>>> data = urllib.urlopen('http://aol.com').read()
>>> d = pq(data)
>>> first_ul = d('ul:first')
>>> first_ul
[<ul#dhL2>]
>>> print first_ul
<ul id="dhL2"><li class="dhL1"><a accesskey="" href="https://new.aol.com/productsweb/?promocode=827693&amp;ncid=txtlnkuswebr00000074" name="om_dirbtn1" class="_o4-0" id="om_dirbtn1">Get Free Mail</a></li>
</ul>