views:

519

answers:

5

I am using the WebBrowser control in my project to display complex HTML documents that are generated/manipulated at runtime.

I have noticed that constructing the DOM programmatically from C# by creating HtmlElement objects is about 3x slower than generating an HTML string and passing it to the WebBrowser, which in turn parses it to generate the DOM. Both ways create a noticeable delay when navigating between lengthy documents.

I am looking for the fastest way to switch between multiple documents in the same WebBrowser control, ideally without having to repeatedly generating the DOM tree for each document. Is it possible to cache a tree of HtmlElement objects somewhere in my program, and then re-insert them into the WebBrowser as needed?

Thanks!

A: 

Maybe rather than caching the DOM you could just flip between several WebBrowser controls on the form - with only the active one being visible?

Joachim Chapman
Thanks, that seems like a good idea, but I'm concerned about the overhead of keeping 50+ WebBrowser controls around.
Jen
It's really not a terrible idea... the webbrowser control is fairly lightweight. Depends on your usage scenario and how scalable you need things to be. Older versions of the webbrowser control tend to leak memory tho. :-/
jeffamaphone
I'll explore that option further and post back...
Jen
+1  A: 

I'd really need to know more about how you are generating these documents. It might be faster to get your data into a XML document and then use a XSL transform to convert the data to HTML and pass that to the WebBrowser control.

The nice thing about the XSLT implementation of .NET is that it takes the XSL source and compiles it to a temporary assembly to speed up the transforms.

If you decide to go that route look up the MVP.XML project which adds some nice exslt functionality to the stock XSLT implementation.

Navaar
I don't mind the initial parse step using an HTML string. I simply want to avoid the browser control having to re-parse the HTML when the user switches back to a previous page.
Jen
+2  A: 

I will describe the solution in terms of the native win32 COM APIs; it shouldn't be too difficult to write the interop to do it in C# (or find it at pinvoke.net). Alternatively, you may need to use the properites that the managed objects expose to get the native ones.

You're not likely to be able to build the DOM yourself faster than IE's parser, so create a blank HTMLDocument (which in native code would be CoCreateInstance(CLSID_HTMLDocument)) and QueryInterface() the HTMLDocument for its IMarkupServices implementation. Also create two IMarkupPointers using the IMarkupServices::CreateMarkupPointer() method.

Next call IMarkupServices::ParseString() to parse your HTML. This will give you a pointer to an IMarkupContainer that contains your DOM, as will as two IMarkupPointers that point to the beginning and end of you DOM. Now you can use IMarkupServices::Move() to move your data from one IMarkupContainer to another.

So the general scheme you would use is to have a single HTMLDocument which is your "display" document, and it's associated IMarkupContainer (which you can just QueryInterface() for). Then you have a vector or list or whatever of all the non-displaying markup containers. Then you just create a markup pointer for your display doc, call IMarkupPointer::MoveToContainer(displayDocumentContainer, true) and then use that to move stuff around from your display container to the not-displaying containers and vice-versa.

One thing to note: you must only access these objects on the thread you create them from, or acquire them on. All IE objects are STA objects. If you need multi-threaded access, you must marshal.

If you have specific follow up questions, let me know.

References:

jeffamaphone
Great answer! I'm not very familiar working with COM, is there a way to attach the COM object from CoCreateInstance to a HtmlDocument so I can manipulate it with more familiar techniques? Thanks!
Jen
Sorry, I'm not very familiar with how the managed/native interop stuff works. IE was built in the before time when there was no C#, so you basically have to do all the goo yourself. Sorry I can't be of more help.
jeffamaphone
A: 

Could you do something like this?

  • Create the contents you want to display inside a DIV
  • Create secondary contents (in the background) inside non-visible DIVs
  • Swap the contents by playing with the visibility
MarkusQ
+2  A: 

This will do it

// On screen webbrowser control
webBrowserControl.Navigate("about:blank");
webBrowserControl.Document.Write("<div id=\"div1\">This will change</div>");
var elementToReplace = webBrowserControl.Document.GetElementById("div1");
var nodeToReplace = elementToReplace.DomElement as mshtml.IHTMLDOMNode;

// In memory webbrowser control to load fragement into
// It needs this base object as it is a COM control
var webBrowserFragement = new WebBrowser();
webBrowserFragement.Navigate("about:blank");
webBrowserFragement.Document.Write("<div id=\"div1\">Hello World!</div>");
var elementReplacement = webBrowserFragement.Document.GetElementById("div1");
var nodeReplacement = elementReplacement.DomElement as mshtml.IHTMLDOMNode;

// The magic happens here!
nodeToReplace.replaceNode(nodeReplacement);
TFD
Any way to do this without having multiple WebBrowser controls?
Jen
Don't think so. The data structure is part of a COM control. Needs COM control to host it. What's the problem with multiple WebBrowser controls? They take about 5MB each, but memory is cheap!
TFD
Using IMarkupContainer etc will result in same problem. So will using Geeko via XPCOM
TFD
Let's say I want to cache 50 pages, with one 5mb browser instance per page, that would impose 250mb of overhead -- not good
Jen
No, you only ever need two instances. One for display, and the other to buffer all other content. Use div with id to isolate each page
TFD
I'm concerned to have one single massive DOM tree structure in this one buffer WebBrowser instance. Won't it introduce scalability issues?
Jen
Any solution requiring the pre-parsing of HTML is going to have scalability issues. You only need to cache what is required for next button press, so there should only be a handful of pages required at any time. You can cache the next possible pages in the background
TFD

related questions