tags:

views:

585

answers:

3

I am looking for a Java regex way of replacing multiple spaces with non-breaking spaces. Two or more whitespaces should be replaced with the same number of non-breaking spaces, but single whitespaces should NOT be replaced. This needs to work for any number of whitespaces. And the first characters could be 1 or more whitespaces.

So if my String starts off like this:

TESTING THIS  OUT   WITH    DIFFERENT     CASES

I need the new String to look like this:

TESTING THIS  OUT   WITH    DIFFERENT     CASES
+2  A: 

Edit: This does not handle punctuation, and reworking it to handle punctuation would require it to use the same approach as Sergio's answer but with two steps instead of one. Therefore, this is an inadequate answer and has been withdrawn.


Original answer below:

The most straightforward way that I can think of is a two-step method.

First, replace all spaces with " ". This is pretty fast because it doesn't have to be a regex.

String testStr = "TESTING THIS  OUT   WITH    DIFFERENT     CASES";
String replaced = testStr.replace(" ", " ");

Next, replace any single instances of " " with spaces.

String replaced2 = replaced.replaceAll("\\b \\b", " ");
System.out.println(replaced2);

Result:

TESTING THIS  OUT   WITH    DIFFERENT     CASES
Michael Myers
That works pretty well. I am just wondering what constitutes a word boundary. If I start throwing some non-alpha characters into the String, will it still handle it correctly?
Shane
That's a good point; punctuation will break it completely. And it can't handle punctuation because the ` `'s themselves end with a semicolon, so it would take more backtracking. And if you're going to backtrack, you might as well do it in one step like Serge did. Leaving the answer up as a reference, but converting to wiki.
Michael Myers
+2  A: 

Let's use some regex (black ?) magic.

String testStr = "TESTING THIS  OUT   WITH    DIFFERENT     CASES";
Pattern p = Pattern.compile(" (?= )|(?<= ) ");
Matcher m = p.matcher(testStr);
String res = m.replaceAll("&nbsp;");

The pattern looks for either a whitespace followed by another one, or a whitespace following another one. This way it catches all blanks in a sequence. On my machine, with java 1.6, I get the expected result:

TESTING THIS&nbsp;&nbsp;OUT&nbsp;&nbsp;&nbsp;WITH&nbsp;&nbsp;&nbsp;&nbsp;DIFFERENT&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;CASES
Serge
+1  A: 

You could also skip the regex all together.

String testStr = "TESTING THIS  OUT   WITH    DIFFERENT     CASES";
String _replaced = testStr.replace("  ", "&nbsp;&nbsp;");
String replaced = _replaced.replace("&nbsp; ", "&nbsp;&nbsp;");

I haven't tested this but the first one finds all cases of two spaces and replaces them with non-breaking spaces. The second finds cases where there were an odd number of white-spaces and corrects it with two nbsps.

thetaiko