An extension to my previous question:
Text cleaning and replacement: delete \n from a text in Java
I am cleaning this incoming text, which comes from a database with irregular text. That means, there' s no standard or rules. Some contain HTML characters like ®, &trade, <, and others come in this form: ”, –, etc. Other times I just get the HTML tags with < and >.
I am using String.replace to replace the characters by their meaning (this should be fine since I'm using UTF-8 right?), and replaceAll() to remove the HTML tags with a regular expression.
Other than one call to the replace() function for each replacement, and compiling the HTML tags regular expression, is there any recommendation to make this replacement efficient?