Unless you are absolutely sure that the HTML will be valid and well formed, I'd strongly recommend to use an HTML parser, something like TagSoup, Jericho, NekoHTML, HTML Parser, etc, the two first being especially powerful to parse any kind of crap :)
For example, with HTML Parser (because the implementation is very easy), using a visitor, provide your own NodeVisitor
:
public class MyNodeVisitor extends NodeVisitor {
public MyNodeVisitor() {
}
public void visitStringNode (Text string)
{
if (string.getText().equals("**text**")) {
string.setText("**new text**");
}
}
}
Then, create a Parser
, parse the HTML string and visit the returned node list:
Parser parser = new Parser(htmlString);
NodeList nl = parser.parse(null);
nl.visitAllNodesWith(new MyNodeVisitor());
System.out.println(nl.toHtml());
This is just one way to implement this, pretty straight forward.