views:

165

answers:

3

I'm using php's preg_split to split up a string based on semi-colons, but I need it to only split on non-escaped semi-colons.

<?
$str = "abc;def\\;abc;def";
$arr = preg_split("/;/", $str);
print_r($arr);
?>

Produces:

Array
(
    [0] => abc
    [1] => def\
    [2] => abc
    [3] => def
)

When I want it to produce:

Array
(
    [0] => abc
    [1] => def\;abc
    [2] => def
)

I've tried "/(^\\)?;/" or "/[^\\]?;/" but they both produce errors. Any ideas?

+2  A: 

I am not really proficient with PHP regexes, but try this one:

/(?<!\\);/
Andrey Shchekin
It needs to be a triple '\'. Using only 2 produced errors here. Not sure why this is so.
Nils Riedemann
Your answer works with triple '\', but Nils went the extra step in explaining why. Get a +1 for effort though!
Corey Hart
+5  A: 

This works.

<?
  $str = "abc;def\;abc;def";
  $arr = preg_split('/(?<!\\\);/', $str);
  print_r($arr);
?>

It outputs:

Array
(
    [0] => abc
    [1] => def\;abc
    [2] => def
) 

You need to make use of a negative lookbehind (read about lookarounds). Think of "match all ';' unless preceed by a '\'".

Nils Riedemann
Thanks for the link!
Corey Hart
A: 

Since Bart asks: Of course you can also use regex to split on unescaped ; and take escaped escape characters into account. It just gets a bit messy:

<?
  $str = "abc;def\;abc\\\\;def";
  preg_match_all('/((?:[^\\\\;]|\\\.)*)(?:;|$)/', $str, $arr);
  print_r($arr);
?>

Array
(
  [0] => Array
      (
          [0] => abc;
          [1] => def\;abc\\;
          [2] => def
      )

  [1] => Array
      (
          [0] => abc
          [1] => def\;abc\\
          [2] => def
      )
)

What this does is to take a regular expression for “(any character except \ and ;) or (\ followed by any character)” and allow any number of those, followed by a ; or the end of the string.

I'm not sure how php handles $ and end-of-line characters within a string, you may need to set some regex options to get exactly what you want for those.

Christopher Creutzig