Hey, I am attempting to use the Microsoft.MSHTML (Version 7.0.3300.0) library to extract the body text from an HTML string. I've abstracted this functionality into a single helper method GetBody(string).
When called in an infinite loop, the process eventually runs out of memory (confirmed by eyeballing Mem Usage in Task Manager). I suspect the problem is due to my incorrect cleanup of the MSHTML objects. What am I doing wrong?
My current definition of GetBody(string) is:
public static string GetBody(string html)
{
mshtml.IHTMLDocument2 htmlDoc = null;
mshtml.IHTMLElement bodyElement = null;
string body;
try
{
htmlDoc = new mshtml.HTMLDocumentClass();
htmlDoc.write(html);
bodyElement = htmlDoc.body;
body = bodyElement.innerText;
}
catch (Exception ex)
{
Trace.TraceError("Failed to use MSHTML to parse HTML body: " + ex.Message);
body = email.Body;
}
finally
{
if (bodyElement != null)
Marshal.ReleaseComObject(bodyElement);
if (htmlDoc != null)
Marshal.ReleaseComObject(htmlDoc);
}
return body;
}
Edit: the memory leak has been traced to the code used in populating a value for html. In this case it was Outlook Redemption.