views:

69

answers:

2

I have a text like this:

...<span>my name is bob and I live in </p><p>America</span>...

I would replace this text in

...<span>my name is bob and I live in </span></p><p><span>America</span>...

I know the replace() function, but I don't know well regular expressions, how it's possible to do this?

Keep in mind that is possible to have other span tags correctly closed before the </p> , for example:

...<span>my name is bob</span> and <span>I live in </p><p>America</span>... 
+3  A: 

In general, you can't parse HTML with regexes, because it's not a regular language.

If you're generating the string in a particular place, and you know it's merely the value itself, then this may be possible. However in that case it's unlikely to be clean because you don't want to embed tags in something that's supposed to be just some CDATA. If you start parsing documents including tags, it's impossible in general to write a proper regex that will capture your case. If your document uses a very limited syntax it may be able to, but I'd be wary about this since I doubt anyone will remember to enforce these limits given future refactoring.

A better solution is to use something like DOM to iterate over the actual generated HTML itself and modify the node tree. Alternatively, on the off-chance you're actually outputting pure XHTML, you could use XSLT to make this translation.

Andrzej Doyle
An example of I can use DOM to modify the node tree?
Erick
A: 

This is a horrible non-solution, but you can use String.replace(CharSequence, CharSequence) to perform string replacement. It has no respect of the wellformedness of the HTML etc. It's just blindly substituting one string for another.

This may or may not work for you. Like any regex approach to HTML, though, it most likely only works some of the time.

System.out.println(
    "bleh </p><p> blah </p><p> blih </p></p> bloh"
    .replace("</p><p>", "</span></p><p><span>")
);
// "bleh </span></p><p><span> blah </span></p><p><span> blih </p></p> bloh"
polygenelubricants