views:

1042

answers:

3

I have strings like:

Avery® Laser & Inkjet Self-Adhesive

I need to convert them to

Avery Laser & Inkjet Self-Adhesive.

I.e. remove special characters and convert html special chars to regular ones.

+3  A: 

You can use the StringEscapeUtils class from Apache Commons Lang project.

romaintaz
A: 

Maybe you can use something like:

yourTxt = yourTxt.replaceAll("&", "&");

in some project I did something like:

public String replaceAcutesHTML(String str) {

str = str.replaceAll("á","á");
str = str.replaceAll("é","é");
str = str.replaceAll("í","í");
str = str.replaceAll("ó","ó");
str = str.replaceAll("ú","ú");
str = str.replaceAll("Á","Á");
str = str.replaceAll("É","É");
str = str.replaceAll("Í","Í");
str = str.replaceAll("Ó","Ó");
str = str.replaceAll("Ú","Ú");
str = str.replaceAll("ñ","ñ");
str = str.replaceAll("Ñ","Ñ");

return str;

}

oropher
That means that you need to unescape every occurrence of every placeholder in HTML, which is a pain, especially when someone has already written it for you.
Chinmay Kanchi
That would work, but its not an ideal approach. To do that you'd have to build (and maintain) a set of all special characters to replace. It's better to use an existing library or encoder than to do manual replacements where possible. It also happens to be easier and less tedious to implement!
Freiheit
+4  A: 
BalusC
Yep, trailing dot is my typo) You're right saying this kind of strings are result of textbased parser reading html.
Vladimir