Hello again! I need a way to replace HTML ascii codes like !
with their correct character in bash. Is there a utility I could run my output through to do this, or something along those lines?
Thanks!
Hello again! I need a way to replace HTML ascii codes like !
with their correct character in bash. Is there a utility I could run my output through to do this, or something along those lines?
Thanks!
I don't know of an easy way, here is what I suppose I would do...
You might be able to script a browser into reading the file in and then saving it as text. If lynx supports html character entities then it might be worth looking in to. If that doesn't work out...
The general solution to something like this is done with sed. You need a "higher order" edit for this, as you would first start with an entity table and then you would edit that table into an edit script itself with a multiple-step procedure. Something like:
. . .
s/&Dagger;/‡/g<br />
s/&#8221;/”/g<br />
. . .
Then, encapsulate this as html, read it in to a browser, and save it as text in the character set you are targeting. If you get it to produce lines like:
s/</</g
then you win. A bash script that calls sed
or ex
can be driven by the substitute commands in the file.