Hey everyone,
our customer supplied us with XML data that needs to be processed using PHP. They chose to abuse attributes by using them for big chunks of text (containing line breaks). The XML parser replaces the line breaks with spaces to make the XML W3 compliant.
To make sure we do not lose our line breaks, I want to read in the file as a string, then translate all line breaks that are between double quotes with
. I think I need a regular expression for that, but I am having trouble coming up with one.
This is my test code (PHP 5) so far, using a look-ahead and look-behind, but it does not work:
$xml = '<tag attribute="Header\r\rFirst paragraph.">\r</tag>';
$pattern = '/(?<=")([^"]+?)\r([^"]+?)(?=")/';
print_r( preg_replace($pattern, "$1 $2", $xml) );
Can anyone help me getting this right? Should be easy for a seasoned regexp master :)