views:

637

answers:

1

I have an mht file, I wish to get all the text of the mht. I tought about using regex, but I have other languages in the mht except english, so the text itself contains stuff like A7=A98=D6...

select all the text of a file viewed in your browser, and then copy and paste it into a notepad - this is what i need.

Thanks.

+1  A: 

Open the file in Internet Explorer and save it as plain text (UTF-8). :) If you need an automated solution, look for an mht to txt converter for your platform or programming language.

Actually, you can automate this in Powershell as well:

$ie = New-Object -ComObject "InternetExplorer.Application"
$ie.Navigate2("file:///C:/MyFile.mht")
$text = $ie.Document.documentElement.innerText
kimsnarf
I'm using C#, is there an mht to txt converter? How do I use powershell with a C# app?
In that case you need the appropriate .NET library. You should tag your question with "c#" and ".net" to get more answers.
kimsnarf
You can also call Powershell from your C# application. This won't be blazing fast but it should work. See for instance here: http://www.codeproject.com/KB/cs/HowToRunPowerShell.aspx
kimsnarf
You can access the same IE object through C# directly, see here: http://msdn.microsoft.com/en-us/library/aa752084(VS.85).aspx
kimsnarf
kimsnarf - can you give a more detailed code for using ie object in C#?