ansaurus

Question

Caching and re-using tree of HtmlElement objects

Answer 1

A:

Maybe rather than caching the DOM you could just flip between several WebBrowser controls on the form - with only the active one being visible?

Joachim Chapman 2009-02-25 16:24:08

Thanks, that seems like a good idea, but I'm concerned about the overhead of keeping 50+ WebBrowser controls around.

Jen 2009-03-03 19:07:46

It's really not a terrible idea... the webbrowser control is fairly lightweight. Depends on your usage scenario and how scalable you need things to be. Older versions of the webbrowser control tend to leak memory tho. :-/

jeffamaphone 2009-03-10 22:29:54

I'll explore that option further and post back...

Jen 2009-03-11 02:51:33

Answer 2

+1 A:

I'd really need to know more about how you are generating these documents. It might be faster to get your data into a XML document and then use a XSL transform to convert the data to HTML and pass that to the WebBrowser control.

The nice thing about the XSLT implementation of .NET is that it takes the XSL source and compiles it to a temporary assembly to speed up the transforms.

If you decide to go that route look up the MVP.XML project which adds some nice exslt functionality to the stock XSLT implementation.

Navaar 2009-03-10 22:23:21

I don't mind the initial parse step using an HTML string. I simply want to avoid the browser control having to re-parse the HTML when the user switches back to a previous page.

Jen 2009-03-11 02:49:59

Answer 3

+2 A:

I will describe the solution in terms of the native win32 COM APIs; it shouldn't be too difficult to write the interop to do it in C# (or find it at pinvoke.net). Alternatively, you may need to use the properites that the managed objects expose to get the native ones.

You're not likely to be able to build the DOM yourself faster than IE's parser, so create a blank HTMLDocument (which in native code would be CoCreateInstance(CLSID_HTMLDocument)) and QueryInterface() the HTMLDocument for its IMarkupServices implementation. Also create two IMarkupPointers using the IMarkupServices::CreateMarkupPointer() method.

Next call IMarkupServices::ParseString() to parse your HTML. This will give you a pointer to an IMarkupContainer that contains your DOM, as will as two IMarkupPointers that point to the beginning and end of you DOM. Now you can use IMarkupServices::Move() to move your data from one IMarkupContainer to another.

So the general scheme you would use is to have a single HTMLDocument which is your "display" document, and it's associated IMarkupContainer (which you can just QueryInterface() for). Then you have a vector or list or whatever of all the non-displaying markup containers. Then you just create a markup pointer for your display doc, call IMarkupPointer::MoveToContainer(displayDocumentContainer, true) and then use that to move stuff around from your display container to the not-displaying containers and vice-versa.

One thing to note: you must only access these objects on the thread you create them from, or acquire them on. All IE objects are STA objects. If you need multi-threaded access, you must marshal.

If you have specific follow up questions, let me know.

References:

jeffamaphone 2009-03-10 22:24:01

Great answer! I'm not very familiar working with COM, is there a way to attach the COM object from CoCreateInstance to a HtmlDocument so I can manipulate it with more familiar techniques? Thanks!

Jen 2009-03-11 02:48:43

Sorry, I'm not very familiar with how the managed/native interop stuff works. IE was built in the before time when there was no C#, so you basically have to do all the goo yourself. Sorry I can't be of more help.

jeffamaphone 2009-03-11 03:28:31

Answer 4

A:

Could you do something like this?

Create the contents you want to display inside a DIV
Create secondary contents (in the background) inside non-visible DIVs
Swap the contents by playing with the visibility

MarkusQ 2009-03-10 22:25:46

Answer 5

+2 A:

This will do it

// On screen webbrowser control
webBrowserControl.Navigate("about:blank");
webBrowserControl.Document.Write("<div id=\"div1\">This will change</div>");
var elementToReplace = webBrowserControl.Document.GetElementById("div1");
var nodeToReplace = elementToReplace.DomElement as mshtml.IHTMLDOMNode;

// In memory webbrowser control to load fragement into
// It needs this base object as it is a COM control
var webBrowserFragement = new WebBrowser();
webBrowserFragement.Navigate("about:blank");
webBrowserFragement.Document.Write("<div id=\"div1\">Hello World!</div>");
var elementReplacement = webBrowserFragement.Document.GetElementById("div1");
var nodeReplacement = elementReplacement.DomElement as mshtml.IHTMLDOMNode;

// The magic happens here!
nodeToReplace.replaceNode(nodeReplacement);

TFD 2009-03-13 00:52:06

Any way to do this without having multiple WebBrowser controls?

Jen 2009-03-13 01:59:36

Don't think so. The data structure is part of a COM control. Needs COM control to host it. What's the problem with multiple WebBrowser controls? They take about 5MB each, but memory is cheap!

TFD 2009-03-13 03:03:10

Using IMarkupContainer etc will result in same problem. So will using Geeko via XPCOM

TFD 2009-03-13 03:04:44

Let's say I want to cache 50 pages, with one 5mb browser instance per page, that would impose 250mb of overhead -- not good

Jen 2009-03-14 18:55:21

No, you only ever need two instances. One for display, and the other to buffer all other content. Use div with id to isolate each page

TFD 2009-03-14 19:52:19

I'm concerned to have one single massive DOM tree structure in this one buffer WebBrowser instance. Won't it introduce scalability issues?

Jen 2009-03-15 19:58:23

Any solution requiring the pre-parsing of HTML is going to have scalability issues. You only need to cache what is required for next button press, so there should only be a handful of pages required at any time. You can cache the next possible pages in the background

TFD 2009-03-15 20:35:03

ansaurus

tags:

views:

answers:

Caching and re-using tree of HtmlElement objects

related questions