views:

97

answers:

2

I found HTMLParser for sax and xml.minidom for xml. I have a pretty well formed html so I don't need a too strong parser - any suggestions?

+1  A: 

Take a look at BeautifulSoup. It's popular and excellent at parsing HTML.

Bartosz
it's not built in if I'm not mistaken
Guy
No, it's not built-in. But you can easily install it using easy_install or just download from the website and put into PYTHONPATH. Whole BeautifulSoup is contained in a single file, so it's not much of a burden.
Bartosz
+2  A: 

I would recommend lxml. I like BeautifulSoup, but there are maintenance issues generally and compatibility issues with the later releases. I've been happy using lxml.


Later: the best recommendations are to use lxml, html5lib, or BeautifulSoup 3.0.8. BeautifulSoup 3.1.x is meant for python 3.x and is known to have problems with earlier python versions, as noted on the BeautifulSoup website.

Ian Bicking has a good article on using lxml.

ElementTree is a further recommendation, but I have never used it.

hughdbrown