views:

134

answers:

1

Trying to get access to a page's DOM after rendering. I do not need to view the page and plan to apply this programmatically without any GUI or interaction.

The reason I am interested in post-rendering is that I want to know where objects appear. Some location information is coded in the HTML (e.g., via offsetLeft), but much is not. Also, Javascript can change the ultimate positioning. I want positions that are as close to what the user will see as possible.

I've looked into Chromium code and think there is a way to do this but there is not enough documentation to get started.

Putting it VERY simply I'd be interested in pseudo-code like this:

DOMRoot *r = new Page("http://stackoverflow.com")->getDom();

Any tips on starting points?

+2  A: 

You should use the Web API wrapper that Chromium exposes; specifically, the WebDocument class contains the functionality that you need. You can call it like this:

WebFrame * mainFrame = webView->mainFrame();
WebDocument document = mainFrame->document();
WebElement docElement = document->docElement();

// Manipulate the DOM here using docElement
...

You can browse the source code for Chromium's Web API wrapper here. Although there's not much in the way of documentation, the header files are fairly well-commented and you can browse the source code for Chrome to see the API in action.

It's difficult to get started using Chromium. I recommend looking at the test_shell application. Also, if you're running on Windows, a framework like CEF (the Chromium Embedded Framework) simplifies the process of embedding Chromium in your application; I use CEF in my current project and I'm very satisfied with it.

Emerick Rogul

related questions