views:

1166

answers:

4

I am currently running into a problem where an element is coming back from my xml file with a single quote in it. This is causing xml_parse to break it up into multiple chunks, example: Get Wired, You're Hired! Is then enterpreted as 'Get Wired, You' being one object, the single quote being a second, and 're Hired!' as a third.

What I want to do is:

while($data = fread($fp, 4096)){
     if(!xml_parse($xml_parser, htmlentities($data,ENT_QUOTES), feof($fp))) {
      break;
     }
    }

But that keeps breaking. I can run a str_replace in place of htmlentities and it runs without issue, but does not want to with htmlentities.

Any ideas?

Update: As per JimmyJ's response below, I have attempted the following solution with no luck (FYI there is a response or two above the linked post that update the code that is linked directly):

function XMLEntities($string)
    {
        $string = preg_replace('/[^\x09\x0A\x0D\x20-\x7F]/e', '_privateXMLEntities("$0")', $string);
        return $string;
    }

    function _privateXMLEntities($num)
    {
    $chars = array(
     39  => ''',
        128 => '€',
        130 => '‚',
        131 => 'ƒ',
        132 => '„',
        133 => '…',
        134 => '†',
        135 => '‡',
        136 => 'ˆ',
        137 => '‰',
        138 => 'Š',
        139 => '‹',
        140 => 'Œ',
        142 => 'Ž',
        145 => '‘',
        146 => '’',
        147 => '“',
        148 => '”',
        149 => '•',
        150 => '–',
        151 => '—',
        152 => '˜',
        153 => '™',
        154 => 'š',
        155 => '›',
        156 => 'œ',
        158 => 'ž',
        159 => 'Ÿ');
        $num = ord($num);
        return (($num > 127 && $num < 160) ? $chars[$num] : "&#".$num.";" );
    }
if(!xml_parse($xml_parser, XMLEntities($data), feof($fp))) {
      break;
     }

Update: As per tom's question below, magic quotes is/was indeed turned off.

Solution: What I have ended up doing to solve the problem is the following:

After collecting the data for each individual item/post/etc, I store that data to an array that I use later for output, then clear the local variables used during collection. I added in a step that checks if data is already present, and if it is, I concatenate it to the end, rather than overwriting it.

So, if I end up with three chunks (as above, let's stick with 'Get Wired, You're Hired!', I will then go from doing $x = 'Get Wired, You' $x = "'" $x = 're Hired!'

To doing: $x = 'Get Wired, You' . "'" . 're Hired!'

This isn't the optimal solution, but appears to be working.

A: 

i remember reading something similar to this on php.net

take a look at this - hope it helps ;)

JimmyJ
A: 

Anyone else?

Cory Dee
+1  A: 

I think having magic quotes enabled can mess up xml parsing sometimes - is this enabled?. You can disable this at runtime using

set_magic_quotes_runtime(0);

Edit: this may not be relevant if the source is not post or get, but I read in the PHP manual that it could cause odd behaviour anyway

Tom Haigh
+1  A: 

Why don't you use something like simplexml_load_file to parse your file easily ?