I found HTMLParser for sax and xml.minidom for xml. I have a pretty well formed html so I don't need a too strong parser - any suggestions?
+1
A:
Take a look at BeautifulSoup. It's popular and excellent at parsing HTML.
Bartosz
2010-05-06 15:10:23
it's not built in if I'm not mistaken
Guy
2010-05-06 15:12:14
No, it's not built-in. But you can easily install it using easy_install or just download from the website and put into PYTHONPATH. Whole BeautifulSoup is contained in a single file, so it's not much of a burden.
Bartosz
2010-05-06 15:17:43
+2
A:
I would recommend lxml. I like BeautifulSoup, but there are maintenance issues generally and compatibility issues with the later releases. I've been happy using lxml.
Later: the best recommendations are to use lxml, html5lib, or BeautifulSoup 3.0.8. BeautifulSoup 3.1.x is meant for python 3.x and is known to have problems with earlier python versions, as noted on the BeautifulSoup website.
Ian Bicking has a good article on using lxml.
ElementTree is a further recommendation, but I have never used it.
hughdbrown
2010-05-06 15:57:37