ansaurus

Question

ASP.NET Deployment Fails - Could not load file or assembly 'Microsoft.mshtml'

Answer 1

+1 A:

If you're trying to parse HTML, instead of MSHTML, try the HTMLAgilityPack, or one of the other suggestions mentioned in this question

http://stackoverflow.com/questions/56107/what-is-the-best-way-to-parse-html-in-c

Jason 2010-09-23 14:56:16

Answer 2

A:

Again, thanks Jason. HTMLAgilityPack did the trick.

In the interest of helping others, I'll post a few code snippets that I found useful (since documentation on the product is sparse).

1) IN YOUR ASP.NET APPLICATION, COPY HtmlAgilityPack.dll AND HtmlAgilityPack.XML INTO YOUR BIN FOLDER.

Check to verify that it is registered by right-clicking the top line in Solution Explorer and viewing 'Property Pages'. If HtmlAgilityPack is not already in your References, click [AddDownArrow], Add Reference, Bin, HtmlAgilityPack, OK.

2) CAPTURE A WEB PAGE AND CONVERT IT TO AN HTML DOC:
Adapted from EggheadCafe's excellent Asynchronous Task example:

Public Function OnBegin(...)
    vRequest = WebRequest.Create("http://www.stackoverflow.com")
    Return vRequest.BeginGetResponse(cb, extraData)
End Function

Public Sub OnEnd(...)
    Private vPage_Text As String = ""
    Private vPage_Doc As New HtmlAgilityPack.HtmlDocument
    Using response As WebResponse = vRequest.EndGetResponse(ar)
        Using reader As StreamReader = New StreamReader(response.GetResponseStream())
            vPage_Text = reader.ReadToEnd()
            vPage_Doc.LoadHtml(vPage_Text)
        End Using
    End Using
End Sub

3) EXTRACT THE ENTIRE HTML DOCUMENT:

vText = vPage_Doc.DocumentNode.OuterHtml

4) EXAMINE EVERY LINK IN THE DOC AND COLLECT THE URLs:

For Each vLinkNode As HtmlAgilityPack.HtmlNode In vPage_Doc.DocumentNode.SelectNodes(".//a")
    vLinkList = vLinkList & vLinkNode.GetAttributeValue("href", "") & vbCrLf
Next

5) EXAMINE EVERY WITH CSS class="item_class" AND COLLECT THE TEXT:

For Each vDivNode As HtmlAgilityPack.HtmlNode In vPage_Doc.DocumentNode.SelectNodes(".//div[@class='item_class']")
    vPageText = vPageText & vDivNode.InnerText & vbCrLf
Next

6) EXTRACT THE DOC'S TITLE AND DESCRIPTION:

Dim vTitleNode As HtmlAgilityPack.HtmlNode = vPage_Doc.DocumentNode.SelectSingleNode(".//title")
vTitleText = vTitleNode.InnerText
Dim vDescriptionNode As HtmlAgilityPack.HtmlNode = vPage_Doc.DocumentNode.SelectSingleNode(".//meta[@name='description']")
vDescriptionText = vDescriptionNode.InnerText

Or the Title in the doc's body:

vBodyTitle = vPage_Doc.DocumentNode.SelectSingleNode(".//h1")

7) EXTRACT AN ELEMENT BY ITS ID:

Dim vBigImageNode As HtmlAgilityPack.HtmlNode = vPage_Doc.GetElementbyId("BigImage")
vImage_URL = vBigImageNode.GetAttributeValue("src", "")
vImage_Height = vBigImageNode.GetAttributeValue("height", "")
vImage_Width = vBigImageNode.GetAttributeValue("width", "")

8) REMOVE A NODE:

vMovieNode.SelectSingleNode(".//div[@class='viewer-reviews']").Remove()

Finally, I had the need to extract a subsection of a page when there were no obvious nodes or other 'attachment points'. The trick is to identify anything that you can 'find' (such as a tag or comment) that can be used as a dividing point in an already-selected node of the doc. Then insert insert ending and beginning tags, thus creating 2 separate subsections withing the node. Finally, create a new HTML doc from the edited node and select the newly-defined node. (If you didn't understand all of that, just follow the code.)
So here is the top-secret, never-before released,

9) EXTRACT ANY PORTION OF A DOCUMENT:

Dim vNewDoc As New HtmlAgilityPack.HtmlDocument
vNewDoc.LoadHtml(vOldDivNode.OuterHtml.Substring(0, vOldDivNode.OuterHtml.IndexOf("<!-- comment") - 1) & _
    "</div><div class=""my_new_node"">" & _
    vOldDivNode.OuterHtml.Substring(vOldDivNode.OuterHtml.IndexOf("<!-- comment") - 1))
Dim vNewDivNode = vNewDoc.SelectSingleNode(".//div[@class='my_new_node']")
Dim vHaHaICapturedYou As String = vNewDivNode.InnerText

Of course, now that I've told you, I'm gonna have to kill you.

Thanks to all of the contributors to Stack Overflow for all of the help you've given me!

Tom 2010-10-11 19:03:07

ansaurus

tags:

views:

answers:

ASP.NET Deployment Fails - Could not load file or assembly 'Microsoft.mshtml'

related questions