views:

442

answers:

3

I'm looking for a python browser widget (along the lines of pyQT4's QTextBrowser class or wxpython's HTML module) that has events for interaction with the DOM. For example, if I highlight an h1 node, the widget class should have a method that notifies me something was highlighted and what dom properties that node had (<h1>, contents of the tag, sibling and parent tags, etc). Ideally the widget module/class would give access to the DOM tree object itself so I can traverse it, modify it, and re-render the new tree.

Does something like this exist? I've tried looking but I'm unfortunately not able to find it. Thanks in advance!

+1  A: 

I would also love such a thing. I suspect one with Python bindings does not exist, but would be really happy to be wrong about this.

One option I recently looked at (but never tried) is the Webkit browser. Now this has some bindings for Python, and built against different toolkits (I use GTK). However there are available API for the entire Javascript machine for C++, but no Python bindings and I don't see any reason why these can't be bound for Python. It's a fairly huge task, I know, but it would be a universally useful project, so maybe worth the investment.

Ali A
+1  A: 

If you don't mind being limited to Windows, you can use the IE browser control. From wxPython, it's in wx.lib.iewin.IEHtmlWindow (there's a demo in the wxPython demo). This gives you full access to the DOM and ability to sink events, e.g.

ie.document.body.innerHTML = u"<p>Hello, world</p>"
Ryan Ginstrom
+2  A: 

It may not be ideal for your purposes, but you might want to take a look at the Python bindings to KHTML that are part of PyKDE. One place to start looking is the KHTMLPart class:

http://api.kde.org/pykde-4.2-api/khtml/KHTMLPart.html

Since the API for this class is based on the signals and slots paradigm used in Qt, you will need to connect various signals to slots in your own code to find out when parts of a document have been changed. There's also a DOM API, so it should also be possible to access DOM nodes for selected parts of the document.

More information can be found here:

http://api.kde.org/pykde-4.2-api/khtml/index.html

David Boddie
This seems like the best available solution at the moment, short of hacking away at webkit. Thanks :)
Karan
Someone reminded me that there's also this wrapper around KHTML that might be easier to get started with:http://paul.giannaros.org/pykhtml/
David Boddie