views:

58

answers:

3

I've been searching for a command line tool that would turn html code into just the text that would appear on the site... so it would be equivalent to in a web browser selecting everything and then pasting it into a text editor...

Anyone know of something in Ubuntu that would do this? I'm trying to write a script to parse some webpages, but would prefer not to have to deal with the HTML and would prefer to just parse the text that appears on the website.

Thanks,

Dan

+6  A: 
lynx -dump http://example.com/
Ignacio Vazquez-Abrams
+2  A: 

if you already have the html file:

lynx -dump file.html > file.txt

otherwise use @Ignacio's

John Boker
+2  A: 

i think you need lynx:

lynx -dump http://stackoverflow.com > file
shuvalov