views:

50

answers:

2

Is there a way I can do something like the following using the standard linux toolchain?

Let's say the source at example.com/index.php is:

Hello, & world! "

How can I do something like this...

curl -s http://example.com/index.php | htmlentities

...that would print the following:

Hello, & world! "

Using only the standard linux toolchain?

+5  A: 

Use recode.

$ echo 'Hello, & world! "' | recode HTML_4.0
Hello, & world! "

EDIT: By the way, recode offers several different conversions corresponding to different versions of HTML and XML, so you can use e.g. HTML_3.2 instead of HTML_4.0 if you have a really old HTML document. Running recode -l will list all the complete list of charsets supported by the program.

David Zaslavsky
`$ man recode` No manual entry for recode `$ type recode` bash: type: recode: not found (not to say it isn't excellent, but is it part of the standard toolchain?)
Stephen P
@Stephen: You have to install it first.
Cam
@Stephen P: Evidently it's not installed on your computer. It's debatable (AFAIK) whether or not `recode` is part of the standard toolchain, but it's very common, and if it isn't considered part of the toolchain, I doubt that anything that is could do the job.
David Zaslavsky
This indeed doesn't seem to be part of the standard toolchain as I requested, but it's in the spirit of such a tool (ie exactly how I wanted) so I've marked it as the correct answer :)
Cam
No argument that it's an excellent choice.
Stephen P
+2  A: 
alias decode="php -r 'echo html_entity_decode(fgets( STDIN ));'"

$ echo 'Hello, & world! "' | decode
Hello, & world! "
Maryam
This is cool, so +1. It doesn't really answer my question though - I was looking for something along the lines of what David provided.
Cam