tags:

views:

78

answers:

2

Hello,

We have HTML source files which contain special characters encoded as &#nnnn; like in the word:

außergewöhnlich

We would like to convert them into plain UTF-8:

außergewöhnlich

Is there any small tool to do that?

+2  A: 

I suppose ascii2uni tool will perform required conversion.

The size of the tool is about several hundreds kilobytes, it is smaller than lynx, mentioned above.

uthark
+1  A: 

You can do this with perl, and HTML::Entities if you wish.

echo 'echo 'außergewöhnlich' |
perl -MHTML::Entities -pe'binmode STDOUT, ":utf8"; HTML::Entities::decode_entities($_)'
Evan Carroll
Again, shooting a mosquito… Perl occupies 45796 KB here.
Marcel Korpel
But this works, and `lynx -dump` fails.
Stephen P
Perl occupies 45meg? That sounds like a mighty ridiculous claim. The binary is 1.2MB. Running that there is 2.3MB resident.
Evan Carroll