tags:

views:

254

answers:

2

Hello again! I need a way to replace HTML ascii codes like ! with their correct character in bash. Is there a utility I could run my output through to do this, or something along those lines?

Thanks!

+1  A: 

I don't know of an easy way, here is what I suppose I would do...

You might be able to script a browser into reading the file in and then saving it as text. If lynx supports html character entities then it might be worth looking in to. If that doesn't work out...

The general solution to something like this is done with sed. You need a "higher order" edit for this, as you would first start with an entity table and then you would edit that table into an edit script itself with a multiple-step procedure. Something like:

. . .
s/&amp;Dagger;/&Dagger;/g<br />
s/&amp;#8221;/&#8221;/g<br />
. . .

Then, encapsulate this as html, read it in to a browser, and save it as text in the character set you are targeting. If you get it to produce lines like:

s/&lt;/</g

then you win. A bash script that calls sed or ex can be driven by the substitute commands in the file.

DigitalRoss
Alright, that's pretty much what I'm already doing, just manually adding each one to the script. I didn't know I could run sed with a scripting file, though, that's a useful bit of info! Thanks!
SphereCat1
If you use this solution, make sure to put `s/ otherwise, if it's before another entry (say `s/!/!/g`), then `!` would get improperly translated to `!` instead of ``.
ephemient
+3  A: 
ephemient
ephemient, this is awesome! The only problem is that it isn't included with OS X, so I'll have to find a way to distribute it.
SphereCat1
An alternative is to pipe through a web browser -- such as `echo '!' | w3m -dump -T text/html`
grawity
@SphereCat1 http://recode.darwinports.com/ http://pdb.finkproject.org/pdb/package.php/recode Don't forget to distribute GNU recode consistent with its license, GPL. @grawity Clever, but I don't think OS X comes with w3m or lynx either ;-)
ephemient
nicerobot