ansaurus

Question

Is there a way to parse html with lxml, but manipulate it with minidom?

Answer 1

+2 A:

Think I found a solution:

from xml.dom.pulldom import SAX2DOM
import lxml.sax
def parse_lxml_dom(html):
    tree = lxml.html.document_fromstring(html)
    handler = SAX2DOM()
    lxml.sax.saxify(tree, handler)
    return handler.document

However, this is only about 7 times faster than html5lib. The saxify call takes quite a while.

Christian Oudard 2009-11-20 17:36:50

ansaurus

tags:

views:

answers:

Is there a way to parse html with lxml, but manipulate it with minidom?

related questions