views:

2015

answers:

4

I've got a code like this :

Dim Document As New mshtml.HTMLDocument
Dim iDoc As mshtml.IHTMLDocument2 = CType(Document, mshtml.IHTMLDocument2)
iDoc.write(html)
iDoc.close()

However when I load an HTML like this it executes all Javascripts in it as well as doing request to some resources from "html" code.

I want to disable javascript and all other popups (such as certificate error).

My aim is to use DOM from mshtml document to extract some tags from the HTML in a reliable way (instead of bunch of regexes).

Or is there another IE/Office DLL which I can just load an HTML wihtout thinking about IE related popups or active scripts?

+1  A: 

If you have the 'html' as a string already, and you just want access to the DOM view of it, why "render" it to a browser control at all?

I'm not familiar with .Net technology, but there has to be some sort of StringToDOM/StringToJSON type of thing that would better suit your needs.

Likewise, if the 'html' variable you are using above is a URL, then just use wget or similar to retrieve the markup as a string, and parse with an applicable tool.

I'd look for a .Net XML/DOM library and use that. (again, I would figure that this would be part of the language, but I'm not sure)

PS after a quick Google I found this (source). Not sure if it would help, if you were to use this in your HTMLDocument instead.

    if(typeof(DOMParser) == 'undefined') {
      DOMParser = function() {}
      DOMParser.prototype.parseFromString = function(str, contentType) {
      if(typeof(ActiveXObject) != 'undefined') {
        var xmldata = new ActiveXObject('MSXML.DomDocument');
        xmldata.async = false;
        xmldata.loadXML(str);
        return xmldata;
     } else if(typeof(XMLHttpRequest) != 'undefined') {
        var xmldata = new XMLHttpRequest;
        if(!contentType) {
          contentType = 'application/xml';
        }
        xmldata.open('GET', 'data:' + contentType + ';charset=utf-8,' + encodeURIComponent(str), false);
        if(xmldata.overrideMimeType) {
          xmldata.overrideMimeType(contentType);
        }
        xmldata.send(null);
        return xmldata.responseXML;
     }
  }
}
scunliffe
A: 

It sounds like you're screenscraping some resource, then trying to programmatically do something w/ the resulting HTML?

If you know it is valid XHTML ahead of time, then load the XHTML string (which is really XML) into an XmlDocument object, and work with it that way.

Otherwise, if it is potentially invalid, or not properly formed, HTML then you'll need something like hpricot (but that is a Ruby library)

defeated
+1  A: 

Dim Document As New mshtml.HTMLDocument Dim iDoc As mshtml.IHTMLDocument2 = CType(Document, mshtml.IHTMLDocument2) 'add this code iDoc.designMode="On" iDoc.write(html)iDoc.close()

A: 

If I remember correctly MSHTML automatically inherits the settings of IE.

So if you disable javascript in internet explorer for the user that is executing the code then Javascript shouldn't run in MSHTML either.

Glenn Condron