views:

52

answers:

2

I have developed a Web application in VS2008. It works perfectly on my development PC. When I publish and upload to the shared Windows hosting service (which supports ASP.NET 3.5), it fails (even when accessing it from my development PC). The error message is:

Could not load file or assembly 'Microsoft.mshtml, Version=7.0.3300.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a' or one of its dependencies. The system cannot find the file specified.

I have read many forum posts on the subject, and have tried the recommended solutions:

  1. Set the reference to Copy Local - VS2008 does not allow Copy Local for ASP.NET references, just for WinForms references.
  2. Copy mshtml.dll into the installation directory - I have tried 3 different versions of the file, both in the root directory and /bin/, under both the names "mshtml.dll" and "Microsoft.mshtml.dll". None work.
  3. install the Interoperability Assemblies from Visual Studio onto the server by running "vs_piaredist.exe" - I don't have admin access to the server and the hosting company won't do it.

I know this issue has been covered before, but the suggested solutions just don't work. Does anyone have any insight?

TIA

+1  A: 

If you're trying to parse HTML, instead of MSHTML, try the HTMLAgilityPack, or one of the other suggestions mentioned in this question

http://stackoverflow.com/questions/56107/what-is-the-best-way-to-parse-html-in-c

Jason
A: 

Again, thanks Jason. HTMLAgilityPack did the trick.

In the interest of helping others, I'll post a few code snippets that I found useful (since documentation on the product is sparse).

1) IN YOUR ASP.NET APPLICATION, COPY HtmlAgilityPack.dll AND HtmlAgilityPack.XML INTO YOUR BIN FOLDER.

Check to verify that it is registered by right-clicking the top line in Solution Explorer and viewing 'Property Pages'. If HtmlAgilityPack is not already in your References, click [AddDownArrow], Add Reference, Bin, HtmlAgilityPack, OK.

2) CAPTURE A WEB PAGE AND CONVERT IT TO AN HTML DOC:
Adapted from EggheadCafe's excellent Asynchronous Task example:

Public Function OnBegin(...)
    vRequest = WebRequest.Create("http://www.stackoverflow.com")
    Return vRequest.BeginGetResponse(cb, extraData)
End Function

Public Sub OnEnd(...)
    Private vPage_Text As String = ""
    Private vPage_Doc As New HtmlAgilityPack.HtmlDocument
    Using response As WebResponse = vRequest.EndGetResponse(ar)
        Using reader As StreamReader = New StreamReader(response.GetResponseStream())
            vPage_Text = reader.ReadToEnd()
            vPage_Doc.LoadHtml(vPage_Text)
        End Using
    End Using
End Sub

3) EXTRACT THE ENTIRE HTML DOCUMENT:

vText = vPage_Doc.DocumentNode.OuterHtml

4) EXAMINE EVERY LINK IN THE DOC AND COLLECT THE URLs:

For Each vLinkNode As HtmlAgilityPack.HtmlNode In vPage_Doc.DocumentNode.SelectNodes(".//a")
    vLinkList = vLinkList & vLinkNode.GetAttributeValue("href", "") & vbCrLf
Next

5) EXAMINE EVERY WITH CSS class="item_class" AND COLLECT THE TEXT:

For Each vDivNode As HtmlAgilityPack.HtmlNode In vPage_Doc.DocumentNode.SelectNodes(".//div[@class='item_class']")
    vPageText = vPageText & vDivNode.InnerText & vbCrLf
Next

6) EXTRACT THE DOC'S TITLE AND DESCRIPTION:

Dim vTitleNode As HtmlAgilityPack.HtmlNode = vPage_Doc.DocumentNode.SelectSingleNode(".//title")
vTitleText = vTitleNode.InnerText
Dim vDescriptionNode As HtmlAgilityPack.HtmlNode = vPage_Doc.DocumentNode.SelectSingleNode(".//meta[@name='description']")
vDescriptionText = vDescriptionNode.InnerText

Or the Title in the doc's body:

vBodyTitle = vPage_Doc.DocumentNode.SelectSingleNode(".//h1")

7) EXTRACT AN ELEMENT BY ITS ID:

Dim vBigImageNode As HtmlAgilityPack.HtmlNode = vPage_Doc.GetElementbyId("BigImage")
vImage_URL = vBigImageNode.GetAttributeValue("src", "")
vImage_Height = vBigImageNode.GetAttributeValue("height", "")
vImage_Width = vBigImageNode.GetAttributeValue("width", "")

8) REMOVE A NODE:

vMovieNode.SelectSingleNode(".//div[@class='viewer-reviews']").Remove()

Finally, I had the need to extract a subsection of a page when there were no obvious nodes or other 'attachment points'. The trick is to identify anything that you can 'find' (such as a tag or comment) that can be used as a dividing point in an already-selected node of the doc. Then insert insert ending and beginning tags, thus creating 2 separate subsections withing the node. Finally, create a new HTML doc from the edited node and select the newly-defined node. (If you didn't understand all of that, just follow the code.)
So here is the top-secret, never-before released,

9) EXTRACT ANY PORTION OF A DOCUMENT:

Dim vNewDoc As New HtmlAgilityPack.HtmlDocument
vNewDoc.LoadHtml(vOldDivNode.OuterHtml.Substring(0, vOldDivNode.OuterHtml.IndexOf("<!-- comment") - 1) & _
    "</div><div class=""my_new_node"">" & _
    vOldDivNode.OuterHtml.Substring(vOldDivNode.OuterHtml.IndexOf("<!-- comment") - 1))
Dim vNewDivNode = vNewDoc.SelectSingleNode(".//div[@class='my_new_node']")
Dim vHaHaICapturedYou As String = vNewDivNode.InnerText

Of course, now that I've told you, I'm gonna have to kill you.

Thanks to all of the contributors to Stack Overflow for all of the help you've given me!

Tom