views:

540

answers:

4

I'm using the MSIE WebBrowser control in a C# desktop application and am looking for a way to build and maintain trees of HtmlElement objects outside of this control. I am trying to quickly switch between multiple complex pages without incurring the overhead of re-parsing the HTML each time (and I don't want to maintain multiple controls that are shown/hidden as needed). I discovered that a) I can only create HtmlElement objects via the control's HtmlDocument and b) once I remove a "trunk" of HtmlElement objects from the control's HtmlDocument, it "dies off," even though I keep maintaining a strong reference to the root element. How can I do this?

P.S. I am willing to consider alternative browser controls (e.g. Gecko) if they allow me to accomplish the above.

+1  A: 

Do you really need to remove them enturely? How about leaving your "branch" in the DOM as the child of a DIV whose style="display:none". That way they're real, live DOM objects but not visible.

jlew
Unfortunately, the individual pages are too complex to be maintained within DIVs of one giant master page...
Jen
What about frames in a frameset?
jlew
+2  A: 

You can use the MSHTML library (mshtml.dll) to achieve this. Basically you would use a single about:blank page and then dynamically write and remove content from it.

See this blog post on this subject

You can also write a custom interface wrapper that exposes the functionality you need from mshtml rather than referencing the whole thing (Nearly 8MB) and it is really easy to do using f12 in VS.

Fraser
+4  A: 

This will do it

// On screen webbrowser control
webBrowserControl.Navigate("about:blank");
webBrowserControl.Document.Write("<div id=\"div1\">This will change</div>");
var elementToReplace = webBrowserControl.Document.GetElementById("div1");
var nodeToReplace = elementToReplace.DomElement as mshtml.IHTMLDOMNode;

// In memory webbrowser control to load fragement into
// It needs this base object as it is a COM control
var webBrowserFragement = new WebBrowser();
webBrowserFragement.Navigate("about:blank");
webBrowserFragement.Document.Write("<div id=\"div1\">Hello World!</div>");
var elementReplacement = webBrowserFragement.Document.GetElementById("div1");
var nodeReplacement = elementReplacement.DomElement as mshtml.IHTMLDOMNode;

// The magic happens here!
nodeToReplace.replaceNode(nodeReplacement);

I doubt this will improve performce as the text renderer is fast, and the memory consumed will still be the same if you have one large page with hidden div's or have multiple div's in memory in other objects?

TFD
A: 

I think you could also use the htmlagilitypack It allows you to parse once, querying the HTML tree using XPath or via iterators and re-writing the tree with a save method when done. Depending on your structure, you might just create an adapter around the classes, because it only works on an entire html document and you want it on elements only, but this should be not too hard.

weismat
Can I insert the generated tree into the WebBrowser control? The project description says it has no dependency on MSHTML, so I'm assuming it uses classes other than HtmlDocument/HtmlElement in System.Windows.Forms.
Jen
Sure - use the Save method on HTMLDocument to save it to a stream and use the WebBroweser.DocumentStream Property to generate the content into the WebBrowser coontrol.
weismat
But the DocumentStream is text, so I'm incurring the parsing penalty that I tried to avoid in the first place, no?
Jen
I think you would parse once, work with the tree and then put back the tree at the very end only once. So parsing/adapting would happen only once.
weismat