tags:

views:

34

answers:

3

Internet Explorer has an option to save a web page as a text file, with all the tags removed. I need a way to batch process that stuff for a project at work. Or there any command line utilities or libraries that can do the same thing for me? COM-interop with IE(not my first choice!)? It doesn't have to format exactly like IE, just give me plain text.

+1  A: 

There are many programs that do this. Some are called html2text. There's this one (which isn't available available natively for Windows, but compiles under Cygwin), and another that is for Win32.

Matthew Flaschen
A: 

I've once seen a script that used lynx for rendering HTML to plain text for automatic generation of a plain text mail from HTML. Not my first choice as well, though.

Joey
A: 

You can do this in C# using the HTML Agility Pack:

var doc = new HtmlWeb.Load(url);
File.WriteAllText(path, doc.DocumentElement.InnerText);
SLaks