Internet Explorer has an option to save a web page as a text file, with all the tags removed. I need a way to batch process that stuff for a project at work. Or there any command line utilities or libraries that can do the same thing for me? COM-interop with IE(not my first choice!)? It doesn't have to format exactly like IE, just give me plain text.
+1
A:
There are many programs that do this. Some are called html2text. There's this one (which isn't available available natively for Windows, but compiles under Cygwin), and another that is for Win32.
Matthew Flaschen
2010-04-27 00:41:07
A:
I've once seen a script that used lynx
for rendering HTML to plain text for automatic generation of a plain text mail from HTML. Not my first choice as well, though.
Joey
2010-04-27 00:41:38
A:
You can do this in C# using the HTML Agility Pack:
var doc = new HtmlWeb.Load(url);
File.WriteAllText(path, doc.DocumentElement.InnerText);
SLaks
2010-04-27 00:41:55