views:

1978

answers:

6

I'm trying to parse json returned from a curl request, like sp:

curl 'http://twitter.com/users/username.json' | sed -e 's/[{}]/''/g' | awk -v k="text" '{n=split($0,a,","); for (i=1; i<=n; i++) print a[i]}'

I have it set working where it splits the json into fields, i.e. the above returns

% ...
"geo_enabled":false
"friends_count":245
"profile_text_color":"000000"
"status":"in_reply_to_screen_name":null
"source":"web"
"truncated":false
"text":"My status"
"favorited":false
% ...

But what I would like to do is grab a specific field (denoted by the -v k=text) and only print that.

Any ideas?

+7  A: 

I've never used it, but you could try out jsawk. It would be something like this (haven't tested this, so I may be wrong):

curl 'http://twitter.com/users/username.json' | jsawk -a 'return this.name'
Brian Campbell
I didn't want to have to add dependencies to the project, hence why I want to use sed/awk/curl, but jsawk seems like it's the most "robust" solution.
Yeah, I understand about not wanting to add extra dependencies. But JSON is a bit much for parsing with regular `awk`, so I thought I'd point out something that looked like it was built for what you're trying to do.
Brian Campbell
+1  A: 

please don't do it!

do not use line-oriented tools to parse hierarchical data serialized into text. it works only for special cases and will haunt you and other people. if you really can't use a ready-made json parser, write a simple one using recursive descent. it's easy and will endure changes the emitting side justly considers cosmetic (added or removed whitespace including newlines).

just somebody
+2  A: 

Use Python's JSON support instead of using awk!

See http://docs.python.org/library/json.html

EDIT

Added explicit plea to use Python JSON support in place of awk.

martinr
or JavaScript. or Perl. or PHP. or C++. heck, I'd bet a can of beer there's a JSON parser for Forth. -1 for partisanship.
just somebody
+1 for recommending something other than plain awk. I don't see how this is worth a -1.
Nick Presta
@Nick Presta: martinr doesn't recommend "something other than plain awk". he urges the OP to use Python without saying how it's better than any of the countless alternatives.
just somebody
Pardon me for trying to come up with a good response...: I shall try harder. Partisanship requires more than writing an awk script to shake it off!
martinr
+1 because I'm also partisan to python.
TokenMacGuy
+1  A: 

You've asked how to shoot yourself in the foot and I'm here to provide the ammo:

curl -s 'http://twitter.com/users/username.json' | sed -e 's/[{}]/''/g' | awk -v RS=',"' -F: '/^text/ {print $2}'

You could use tr -d '{}' instead of sed. But leaving them out completely seems to have the desired effect as well.

If you want to strip off the outer quotes, pipe the result of the above through sed 's/\(^"\|"$\)//g'

I think others have sounded sufficient alarm. I'll be standing by with a cell phone to call an ambulance. Fire when ready.

Dennis Williamson
This way madness lies, read this: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454
Dennis Williamson
A: 

here's one way you can do it with awk

curl -sL 'http://twitter.com/users/username.json' | awk -F"," -v k="text" '{
    gsub(/{|}/,"")
    for(i=1;i<=NF;i++){
        if ( $i ~ k ){
            print $i
        }
    }
}'
ghostdog74
A: 

How about using Rhino? It's a command-line JavaScript tool. Unfortunately, it's a bit rough for this type of application. It doesn't read from stdin very well.

User1