views:

5658

answers:

2

I'm pretty new to regular expressions. I have a requirement to replace spaces in a piece of multi-line text. The replacement rules are these:

  • Replace all spaces at start-of-line with a non-breaking space ( )
  • Replace any instance of repeated spaces (more than one space together) with the same number of non-breaking-spaces
  • Single spaces which are not at start-of-line remain untouched

I used the Regex Coach to build the matching pattern:

/( ){2,}|^( )/

Let's assume I have this input text:

asdasd asdasd  asdas1
 asda234 4545    54
  34545 345  34534
34 345

using a php regex replace function (like preg_replace()) I want to get this output:

asdasd asdasd  asdas1
 asda234 4545    54
  34545 345  34534
34 345

I'm happy doing simple text substitutions using regular expressions, but i'm having trouble working out how to replace multiple-times inside the match in order to get the output i desire.

+7  A: 

I'd guess that it would be easier to find each space and replace it. To do that, use "look-ahead" and "look-behind" groups.

Or, find a space (\x20) that is either lead by or followed by any single whitespace (\s); but, only replace the space.

$str = "asdasd asdasd  asdas1\n asda234 4545    54\n  34545 345  34534\n34 345\n";

print preg_replace("/(?<=\s)\x20|\x20(?=\s)/", "&#160;", $str);

(I opted for #160 since markdown parses nbsp.)

Results in:

asdasd asdasd&#160;&#160;asdas1
&#160;asda234 4545&#160;&#160;&#160;&#160;54
&#160;&#160;34545 345&#160;&#160;34534
34 345

For more info, check out PCRE and perlre.


reply to comments

@Sprogz: At first, I thought the same. But the example shows a "\n " => "\n&nbsp;" between the 1st and 2nd lines.

Jonathan Lonowski
Good answer but is it worth changing your suggestion to use \s as it was only sequences of spaces that the questioner wanted replacing and \s will of course include tabs, newlines, carriage returns and a few others with certain settings enabled. I'd go with /(?<=\x20)\x20|\x20(?=\x20)|^\x20/
Sprogz
Thank you! /(?<=\x20)\x20|\x20(?=\x20)|^\x20/ with a "m" at the end works perfectly!
knight_killer
+2  A: 

You can use PHP's /e modifier to execute some code in the replacement, like this:

$str = preg_replace('/( {2,}|^ )/em', 'str_repeat("&nbsp;", strlen("\1"))', $str);

I've changed the regular expression to capture the spaces. The /m modifer puts it into multi-line mode, so ^ matches the start of any line.

Greg
Thank you for the hint with the multi-line mode!
knight_killer