views:

847

answers:

5

The Interwebs are no help on this one. We're encoding data in ColdFusion using serializeJSON and trying to decode it in PHP using json_decode. Most of the time, this is working fine, but in some cases, json_decode returns NULL. We've looked for the obvious culprits, but serializeJSON seems to be formatting things as expected. What else could be the problem?

UPDATE: A couple of people (wisely) asked me to post the output that is causing the problem. I would, except we just discovered that the result set is all of our data (listing information for 2300+ rental properties for a total of 565,135 ASCII characters)! That could be a problem, though I didn't see anything in the PHP docs about a max size for the string. What would be the limiting factor there? RAM?

UPDATE II: It looks like the problem was that a couple of our users had copied and pasted Microsoft Word text with "smart" quotes. Those pesky users...

+1  A: 

can you replicate this issue reliably? and if so can you post sample data that returns null? i'm sure you know this, but for informational sake for others stumbling on this who may not, RFC 4627 describes JSON, and it's a common mistake to assume valid javascript is valid JSON. it's better to think of JSON as a subset of javascript.

in response to the edit:

i'd suggest checking to make sure your information is being populated in your PHP script (before it's being passed off to json_decode), and also validating that information (especially if you can reliably reproduce the error). you can try an online validator for convenience. based on the very limited information it sounds like perhaps it's timing out and not grabbing all the data? is there a need for such a large dataset?

Owen
A: 

You could try parsing it with another parser, and looking for an error -- I know Python's JSON parsers are very high quality. If you have Python installed it's easy enough to run the text through demjson's syntax checker. If it's a very large dataset you can use my library jsonlib -- memory use will be higher than with demjson, but it will run faster because it's written in C.

John Millikin
A: 

Good to know Word special characters at not encoded/decoded properly by some json encoders/decoders.

Any idea if PHP is able to encode properly?

Darryl Hein
The problem may not be with improper encoding/decoding of Unicode (the curly quotes are not specific to Word) characters. The JSON specification doesn't mandate the encoding of Unicode characters. The problem could be due to the endpoints not using a common *character encoding* for the JSON string.
Ates Goral
+2  A: 

You could try operating in UTF-8 and also letting PHP know that fact.

I had an issue with PHP's json_decode not being able to decode a UTF-8 JSON string (with some "weird" characters other than the curly quotes that you have). My solution was to hint PHP that I was working in UTF-8 mode by inserting a Content-Type meta tag in the HTML page that was doing the submit to the PHP. That way the content type of the submitted data, which is the JSON string, would also be UTF-8:

<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>

After that, PHP's json_decode was able to properly decode the string.

Ates Goral
+1  A: 

I had this exact problem and it turns out it was due to ColdFusion putting none printable characters into the JSON packets (these characters did actually exist in our data) but they can't go into JSON.

Two questions on this site fixed this problem for me, although I went for the PHP solution rather than the ColdFusion solution as I felt it was the more elegant of the two.

PHP solution

Fix the string before you pass it to json_decode()

$string = preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $string);

ColdFusion solution

Use the cleanXmlString() function in that SO question after using serializeJSON()

Stewart Robinson