views:

76

answers:

2

I have a string that contains the following text

String my_string = "hello world. it's cold out brrrrrr! br br";

I'd like to replace each isolated br with <br />

The issue is that I'd like to avoid converting the string to

"hello world. it's cold out <br />rrrrr! <br /> <br />";

What I'd like to do is convert the string (using replaceAll) to

"hello world. it's cold out brrrrrr! <br /> <br />";

I'm sure this is very simple, but my regex isn't correct.

my_string.replaceAll("\\sbr\\s|\\sbr$", "<br />");

my regex is supposed to find 'whitespace' 'b' 'r' 'whitespace' OR 'whitespace' 'b' 'r' 'end of line'

but it misses the final "br" in my string

"hello world. it's cold out brrrrrr!<br />br"

what am I doing wrong?? TKS!

+7  A: 

Use

my_string.replaceAll("\\bbr\\b", "<br />");

Your regex doesn't work because in

␣br␣br
^

The pattern \sbr\s will consume the whole ␣br␣, leaving with

<br />br
      ^

now there is no preceding space for this br to match \sbr$, so it will be missed.

On the other hand, the \b, meaning a word-boundary, is a zero-width assertion, i.e. it won't consume any characters. Therefore the spaces will be kept and all isolated br's will be matched.

KennyTM
ok, so let me see if i get this straight... \b is word boundary, right? and apparently it works on "end of line" too?
The end of a line is a word boundary if the last character in the line is a word character:. More info: http://www.regular-expressions.info/wordboundaries.html
Alan Moore
@user141146: Yes.
KennyTM
@Tony: [not reproducible](http://www.ideone.com/rcxNC). What kind of Java are you using?
KennyTM
Comment deleted. I didn't notice the OP's string didn't meet his own criteria about leading whitespace.
Tony Ennis
@KennyTM, what happens if you use "hello world. it's cold out brrrrrr!<br />br" as an input string? I still get a bad result... `hello world. it's cold out brrrrrr!<<br /> /><br />'
Tony Ennis
Latest JVM, yadda. Code is: String myString = "hello world. it's cold out brrrrrr!<br />br"; System.out.println(myString.replaceAll("\\bbr\\b", "<br />"));
Tony Ennis
@Tony: This is because `<` and `b` form a word boundary. Since OP doesn't specify what to do in this case, I don't think it is a "bad" result.
KennyTM
@KennyTM - You are correct - it is undefined. But do you think nesting a break statement inside of angle brackets is a desirable result? You can just about guess what the OP's _next_ question will be, heh. This shows how difficult it can be to do any manipulation of HTML using regexp.
Tony Ennis
@KennyTM. Thank you. I didn't realize there was a concept of "consumption"
And maybe you should add `.replaceAll("\\bout\\b", "outside")`.
Roland Illig
A: 

"hello world. it's cold out brrrrrr!<br />br" Your final 'br' isn't preceded by whitespace. What's supposed to happen?

Tony Ennis