Python: Is there a built in package to parse html into dom | ansaurus

tags:

views:

97

answers:

2

+2 Q:

Python: Is there a built in package to parse html into dom

I found HTMLParser for sax and xml.minidom for xml. I have a pretty well formed html so I don't need a too strong parser - any suggestions?

+1 A:

Take a look at BeautifulSoup. It's popular and excellent at parsing HTML.

Bartosz 2010-05-06 15:10:23

it's not built in if I'm not mistaken

Guy 2010-05-06 15:12:14

No, it's not built-in. But you can easily install it using easy_install or just download from the website and put into PYTHONPATH. Whole BeautifulSoup is contained in a single file, so it's not much of a burden.

Bartosz 2010-05-06 15:17:43

+2 A:

I would recommend lxml. I like BeautifulSoup, but there are maintenance issues generally and compatibility issues with the later releases. I've been happy using lxml.

Later: the best recommendations are to use lxml, html5lib, or BeautifulSoup 3.0.8. BeautifulSoup 3.1.x is meant for python 3.x and is known to have problems with earlier python versions, as noted on the BeautifulSoup website.

Ian Bicking has a good article on using lxml.

ElementTree is a further recommendation, but I have never used it.

hughdbrown 2010-05-06 15:57:37

related questions

Autosizing Textarea

Regular expression for parsing links from a webpage?

What are good tools for creating compiled HTML help files (.chm)?

Looking for WYSIWYG HTML editor

Any reason not to start using the HTML 5 doctype?

HTML comments break down

HTML Comments Markup

Setting a div's height in HTML with CSS

Wrapping lists into columns

Is a "Confirm Email" input good practice when user changes email address?

<XMP> Tag

HTML version choice

Options for HTML scraping?

How do you disable browser Autocomplete on web form field / input tag?

How do I make a checkbox toggle from clicking on the text label as well?

Html CSS Editor

Wordpress theme development offline tools

How do I give my web sites an icon for iPhone?

In HTML, how to word-break on a dash?

Detecting font in JavaScript

How do you test layout design across multiple browsers/OSs?

How do I print an HTML document from a web service?

Multiple submit buttons on a HTML form

How can I determine a web user's time zone?

Why doesn't the percentage width child in absolutely positioned parent work in IE7?