views:

614

answers:

5

I am fetching a .js file from a remote site that contains data I want to process as JSON using the simplejson library on my Google App Engine site. The .js file looks like this:

var txns = [
    { apples: '100', oranges: '20', type: 'SELL'}, 
    { apples: '200', oranges: '10', type: 'BUY'}]

I have no control over the format of this file. What I did at first just to hack through it was to chop the "var txns = " bit off of the string and then do a series of .replace(old, new, [count]) on the string until it looked like standard JSON:

cleanJSON = malformedJSON.replace("'", '"').replace('apples:', '"apples":').replace('oranges:', '"oranges":').replace('type:', '"type":').replace('{', '{"transaction":{').replace('}', '}}')

So that it now looks like:

[{ "transaction" : { "apples": "100", "oranges": "20", "type": "SELL"} }, 
 { "transaction" : { "apples": "200", "oranges": "10", "type": "BUY"} }]

How would you tackle this formatting issue? Is there a known way (library, script) to format a JavaScript array into JSON notation?

A: 

http://www.devpro.it/JSON/files/JSON-js.html

Calvin L
I gave this a quick glance and it seems to be JavaScript. I am in a python script (using urlfetch) when I bring in the .js file to be processed and I want to render a python list or dict out to my Django template ... I'm not sure how this library can help me?
Greg
A: 

If you know that's what it's always going to look like, you could do a regex to find unquoted space-delimited text that ends with a colon and surround it with quotes.

I'm always worried about unexpected input with a regex like that, though. How do you know the remote source won't change what you get?

Nosredna
I don't know if/when it changes ... it's a fragile solution to be sure and I'll have error handling to report when it changes. But this is just for a hobby app. :)
Greg
+3  A: 

I would use the yaml parser as its better at most things like this. It comes with GAE as well as it is used for the config files. Json is a subset of yaml.

All you have to do is get rid of "var txns =" then yaml should do the rest.

import yaml

string = """[{ apples: '100', oranges: '20', type: 'SELL'}, 
             { apples: '200', oranges: '10', type: 'BUY'}]"""

list = yaml.load(string)

print list

This gives you.

[{'type': 'SELL', 'apples': '100', 'oranges': '20'},
 {'type': 'BUY', 'apples': '200', 'oranges': '10'}]

Once loaded you can always dump it back as a json.

David Raznick
Cool, I wasn't aware of the yaml lib. I am having some difficulty with one of the fields now ... it has some spurious characters that I have to suppress. I might need to go with the pyparsing solution given that issue.
Greg
Greg, what issue are you having?
Nosredna
Basically I have some values with slashes, escapes, colons and apostrophes like 'oranges:with:1,ripe:\"yes\"' that just make it hard to sweep the text to do the original parsing.
Greg
+3  A: 

It's not too difficult to write your own little parsor for that using PyParsing.

import json
from pyparsing import *

data = """var txns = [
   { apples: '100', oranges: '20', type: 'SELL'}, 
   { apples: '200', oranges: '10', type: 'BUY'}]"""


def js_grammar():
    key = Word(alphas).setResultsName("key")
    value = QuotedString("'").setResultsName("value")
    pair = Group(key + Literal(":").suppress() + value)
    object_ = nestedExpr("{", "}", delimitedList(pair, ","))
    array = nestedExpr("[", "]", delimitedList(object_, ","))
    return array + StringEnd()

JS_GRAMMAR = js_grammar()

def parse(js):
    return JS_GRAMMAR.parseString(js[len("var txns = "):])[0]

def to_dict(object_):
    return dict((p.key, p.value) for p in object_)

result = [
    {"transaction": to_dict(object_)}
    for object_ in parse(data)]
print json.dumps(result)

This is going to print

[{"transaction": {"type": "SELL", "apples": "100", "oranges": "20"}},
 {"transaction": {"type": "BUY", "apples": "200", "oranges": "10"}}]

You can also add the assignment to the grammar itself. Given there are already off-the-shelf parsers for it, you should better use those.

Torsten Marek
Thanks for the reference to pyparsing...this will come in handy in the future. Not sure which answer to accept yet.
Greg
One of the details I left out was that one of the fields in the array is a beast of characters that make yaml choke (:, ', "). I'll need to suppress them and I think this solution will let me do that.
Greg
This is pretty nice, although it doesn't handle the true, false, and null keywords, or Unicode escapes (not sure if they will ever pop up).
Kiv
It's not too difficult to extend the range of allowable values and add a little dictionary with builtins like those. I mostly wrote up that answer to advertise PyParsing:)
Torsten Marek
A: 

You could create an intermediate page containing a Javascript script that just loads the remote one and dumps it to JSON. Then Python can make requests to your intermediate page and get out nice JSON.

Kiv
I'd prefer to keep it to one hop because I'll be doing this with several files ... so multiply each extra hop by at least six for now.
Greg
You can bundle all your requests into one request to the intermediate page, so this only actually adds one hop total.
Kiv
Good point - duh. :)
Greg