views:

831

answers:

6

I'm trying to mimic the json_encode bitmask flags implemented in PHP 5.3.0, here is the string I have:

$s = addslashes('O\'Rei"lly'); // O\'Rei\"lly

Doing json_encode($s, JSON_HEX_APOS | JSON_HEX_QUOT) outputs the following:

"O\\\u0027Rei\\\u0022lly"

And I'm currently doing this in PHP versions older than 5.3.0:

str_replace(array('\\"', "\\'"), array('\\u0022', '\\\u0027'), json_encode($s))
or
str_replace(array('\\"', '\\\''), array('\\u0022', '\\\u0027'), json_encode($s))

Which correctly outputs the same result:

"O\\\u0027Rei\\\u0022lly"

I'm having trouble understanding why do I need to replace single quotes ('\\\'' or even "\\'" [surrounding quotes excluded]) with '\\\u0027' and not just '\\u0027'.


Here is the code that I'm having trouble porting to PHP < 5.3:

if (get_magic_quotes_gpc() && version_compare(PHP_VERSION, '6.0.0', '<'))
{
    /* JSON_HEX_APOS and JSON_HEX_QUOT are availiable */
    if (version_compare(PHP_VERSION, '5.3.0', '>=') === true)
    {
        $_GET = json_encode($_GET, JSON_HEX_APOS | JSON_HEX_QUOT);
        $_POST = json_encode($_POST, JSON_HEX_APOS | JSON_HEX_QUOT);
        $_COOKIE = json_encode($_COOKIE, JSON_HEX_APOS | JSON_HEX_QUOT);
        $_REQUEST = json_encode($_REQUEST, JSON_HEX_APOS | JSON_HEX_QUOT);
    }

    /* mimic the behaviour of JSON_HEX_APOS and JSON_HEX_QUOT */
    else if (extension_loaded('json') === true)
    {
        $_GET = str_replace(array(), array('\\u0022', '\\u0027'), json_encode($_GET));
        $_POST = str_replace(array(), array('\\u0022', '\\u0027'), json_encode($_POST));
        $_COOKIE = str_replace(array(), array('\\u0022', '\\u0027'), json_encode($_COOKIE));
        $_REQUEST = str_replace(array(), array('\\u0022', '\\u0027'), json_encode($_REQUEST));
    }

    $_GET = json_decode(stripslashes($_GET));
    $_POST = json_decode(stripslashes($_POST));
    $_COOKIE = json_decode(stripslashes($_COOKIE));
    $_REQUEST = json_decode(stripslashes($_REQUEST));
}
+2  A: 

It's escaping the backslash as well as the quote. It's difficult dealing with escaped escapes, as you're doing here, as it quickly turns into backslash counting games. :-/

staticsan
But I believe I'm making the math correctly. The replacement for single quotes should only need two backslashes, no?
Alix Axel
It's not that simple because some of your string contstants are in double quotes, which don't need single quotes escaped, and some are in single quotes, which do. Plus the complication of trying to *generate* a valid string constant. For another language. Which happens to have almost identical quoting rules. (Headache yet? :-)
staticsan
Plus, too, your sample string *starts* with explicit escaping *in the string*. So you're asking `json_encode()` to escape the escaping, too. And on top of all that, I think the need for `json_encode()` to produce UTF entities is questionable (I've found bugs in that function before).
staticsan
@staticsan: Yes! Still, I can't understand why `str_replace("\\'", '\\\u0027', json_encode(addslashes("O'Reilly")))` and `str_replace("\\'", '\\\\u0027', json_encode(addslashes("O'Reilly")))` produce the exact same output. Can you?
Alix Axel
@staticsan: Shouldn't the `\\\u0027` version output only two backslashes and escape the "u"?
Alix Axel
I think `'\\\u0027'` is doing the same thing as `'\\\\u0027'` because `'\u'` (note the *single* quotes) doesn't mean anything to PHP so it's semantically the same.
staticsan
I ran into a similar issue involving quoting and json on PHP and it had to do with magic_quotes_gpc, did you check if magic quotes are turned off?
dabito
@dabito: No, `magic_quotes` are `On`. The `str_replace` is there to fix `magic_quotes`, see http://www.php.net/manual/en/function.get-magic-quotes-gpc.php#95697.
Alix Axel
Ouch. The correct thing to do is to turn `magic_quotes` off. If you can't do that, then the *first* thing you do with any input is put it through `stripslashes()`. Any later is too late and you will run into the sorts of confusion you're experiencing.
staticsan
@staticsan: Why do you say ouch? This is way faster than any other method I've seen, and more complete also.
Alix Axel
"Ouch" is because magic_quotes tries to solve a real and common problem (SQL injection) in the wrong place (when data is submitted to the page instead of upon DB insertion) and in the wrong way (addslashes instead of the DB's quoting function). *Turn them off!*
staticsan
+1  A: 

Since you are going to json_encode the string \' you will have to encode first the \ then the '. So you will have \\ and \u0027. Concatenating these results \\\u0027.

Zsolti
I still don't follow. Why should `\u0027` be escaped? `json_encode('"', JSON_HEX_QUOT); // "\u0022"` and `json_encode("'", JSON_HEX_APOS) // "\u0027"` return similar output, yet the first doesn't need any additional slashes.
Alix Axel
Your original string is 'O\'Rei"lly' (all in single quotes). In single quotes the \ is not an escape character. So in this case it will be encoded too. If you write "O'Rei\"lly" you will get the needed result.
Zsolti
@Zsolti: In single quotes "\" **is** an escape character. Still, I tried your suggestion and it still doesn't work with `\\u0027`.
Alix Axel
A: 

The \ generated by addslashes() get re-escaped by json_encode(). You probably meant to say this Doing json_encode($s, JSON_HEX_APOS | JSON_HEX_QUOT) outputs the following but you used $str instead of $s, which confused everyone.

If you evaluate the string "O\\\u0027Rei\\\u0022lly" in JavaScript, you will get "O\'rei\"lly" and I am pretty sure that's not what you want. When you evaluate it, you probably need all the control codes removed. Go ahead, poke this in a file: alert("O\\\u0027Rei\\\u0022lly").

Conclusion: You are escaping the quotes twice, which is most likely not what you need. json_encode already escapes everything that is needed so that any JavaScript parser would return the original data structure. In your case, that is the string you have obtained after the call to addslashes.


Proof:

<?php $out = json_encode(array(10, "h'ello", addslashes("h'ello re-escaped"))); ?>
<script type="text/javascript">
  var out = <?php echo $out; ?>;
  alert(out[0]);
  alert(out[1]);
  alert(out[2]);
</script>
Tom
I "need" to escape the string twice, as in I already get the string with added slashes in case `magic_quotes` in `On`. Still your answer doesn't address my issue with replacing escaped chars.
Alix Axel
You didn't read careful enough. You don't need to do that. In fact, you _shouldn't_ do that. It is just what happens because of addslashes(). `\` are escaped by `json_encode` because they are considered to be part of the string you want as output. What you should do is disable `magic quotes` (or force `array_walk_recursive($_REQUEST, 'stripslashes')`) and everything will be clear.
Tom
@Tom: And the world would be a lot safer without nuclear weapons, but that doesn't mean there aren't any. My original string has slashes and the very reason of this question is to avoid recursive calls in `PHP >= 5.2 < 5.3` to fix `magic_quotes` (see my PHP 5.3 solution at http://www.php.net/manual/en/function.get-magic-quotes-gpc.php#95697).
Alix Axel
+2  A: 

If I understand correctly, you just want to know why you need to use

'\\\u0027' and not just '\\u0027'

You're escaping the slash and the character unicode value. With this you are telling json that it should put an apostrophe there, but it needs the backslash and the u to know that a unicode hexadecimal character code is next.

Since you are escaping this string:

$s = addslashes('O\'Rei"lly'); // O\'Rei\"lly

the first backslash is actually escaping the backslash before the apostrophe. Then next slash is used to escape the backslash used by json to identify the character as a unicode character.

If you were appplying the algorythm to O'Reilly instead of O\'Rei\"lly then the latter would suffice.

I hope you find this useful. I only leave you this link so you can read more on how json is constructed, since its obvious you already understand PHP:

http://www.json.org/fatfree.html

dabito
+6  A: 

The PHP string

'O\'Rei"lly'

is just PHP's way of getting the literal value

O'Rei"lly

into a string which can be used. Calling addslashes on that string changes it to be literally the following 11 characters

O\'Rei\"lly

i.e. strlen(addslashes('O\'Rei"lly')) == 11

This is the value which is being sent to json_escape.

In JSON backslash is an escape character, so that needs to be escaped, i.e.

\ to be \\

Also single and double quotes can cause problems. So converting them to their unicode equivalent in one way to avoid problems. So later verions of PHP's json_encode change

' to be \u0027

and

" to be \u0022

So applying these three rules to

O\'Rei\"lly

gives us

O\\\u0027Rei\\\u0022lly

This string is then wrapped in double quotes to make it a JSON string. Your replace expressions include the leading forward slashes. Either by accident or on purpose this means that the leading and trailing double quote returned by json_encode is not subject to the escaping, which it shouldn't be.

So in earlier versions of PHP

$s = addslashes('O\'Rei"lly');
print json_encode($s);

would print

"O\\'Rei\\\"lly"

and we want to change ' to be \u0027 and we want to change \" to be \u0022 because the \ in \" is just to get the " into the string because it begins and ends with double-quotes.

So that's why we get

"O\\\u0027Rei\\\u0022lly"
awatts
Makes sense. So doing `str_replace(array('\"', '\''), array('\\u0022', '\\u0027'), json_encode(addslashes('O\'Rei"lly')))` will always yield the exact same output as `json_encode(addslashes('O\'Rei"lly'), JSON_HEX_APOS | JSON_HEX_QUOT)`, right?
Alix Axel
Doesn't seem to work when a string ends with a slash.
Alix Axel
A: 

When you encode a string for json, some things have to be escaped regardless of the options. As others have pointed out, that includes '\' so any backslash run through json_encode will be doubled. Since you are first running your string through addslashes, which also adds backslashes to quotes, you are adding a lot of extra backslashes. The following function will emulate how json_encode would encode a string. If the string has already had backslashes added, they will be doubled.

function json_encode_string( $encode , $options ) {
    $escape = '\\\0..\37';
    $needle = array();
    $replace = array();

    if ( $options & JSON_HEX_APOS ) {
        $needle[] = "'";
        $replace[] = '\u0027';
    } else {
        $escape .= "'";
    }

    if ( $options & JSON_HEX_QUOT ) {
        $needle[] = '"';
        $replace[] = '\u0022';
    } else {
        $escape .= '"';
    }

    if ( $options & JSON_HEX_AMP ) {
        $needle[] = '&';
        $replace[] = '\u0026';
    }

    if ( $options & JSON_HEX_TAG ) {
        $needle[] = '<';
        $needle[] = '>';
        $replace[] = '\u003C';
        $replace[] = '\u003E';
    }

    $encode = addcslashes( $encode , $escape );
    $encode = str_replace( $needle , $replace , $encode );

    return $encode;
}
drawnonward