ansaurus

Question

Transposing JSON list-of-dictionaries for analysis in R

Answer 1

+2 A:

This is interesting. The easiest way would be to fix the Python code so that the dict can be transformed more easily.

But, how about this?

k1 <- unlist(lapply(data,FUN=function(x){return(x[[1]])}))
k2 <- unlist(lapply(data,FUN=function(x){return(x[[2]])}))
data.frame(k1,k2)

You will need to cast k1 and k2 into the correct data type still, but this should accomplish what you are looking for.

Ryan Rosario 2010-02-14 05:05:38

A cleaner generalization if you have a lot of columns would be:newdata <- lapply(1:length(data[[1]]), function(x) unlist(lapply(data, "[[", x)));newdata <- as.data.frame(newdata);names(newdata) <- names(data[[1]])

brentonk 2010-02-14 06:14:50

I clearly can preprocess the JSON to transpose it before loading, but the problem is that I don't view this as "fixing" it at all: a list of dicts _is_ the most natural way to think about this data. A dict of lists is just the more convenient way for row-oriented software to load it densely, not the best way to think about it.And manually unpacking every entry is untenable. Bretonk's method, however, works. (I clearly need to better grok the meaning of `[[` as opposed to plain subset (`[`), among other things.)

jrk 2010-02-14 06:35:36

My solution works for two columns, which is clearly what your question implied. If you have several columns, then of course you need to use a generalization, such as brentonk's method.

Ryan Rosario 2010-02-14 07:27:09

You're right—I misread that (without running it) as being a *row-wise* operation, requiring an invocation on every data element, not just on every column. More explanation would have made that clearer. Still, for large numbers of columns, the further generalization is useful. Thanks, both. If you want to add an explanation of the generalization over many columns I'd gladly mark it "accepted". I think that would be useful for future viewers, rather than keeping it buried in the comments.

jrk 2010-02-14 07:59:16

Answer 2

+4 A:

The l*ply functions can be your best friend when doing with list processing. Try this:

> library(plyr)
> ldply(data, data.frame)
  k1 k2
1 v1 v2
2 v3 v4

plyr does some very nice processing behind the scenes to deal with things like irregular lists (e.g. when each list doesn't contain the same number of elements). This is very common with JSON and XML, and is tricky to handle with the base functions.

Or alternatively using base functions:

> do.call("rbind", lapply(data, data.frame))

You can use rbind.fill (from plyr) instead of rbind if you have irregular lists, but I'd advise just using plyr from the beginning to make your life easier.

Edit:

Regarding your more complicated example, using Hadley's suggestion deals with this easily:

> x<-list(list(k1=2,k2=3),list(k2=100,k1=200),list(k1=5, k3=9))
> ldply(x, data.frame)
   k1  k2 k3
1   2   3 NA
2 200 100 NA
3   5  NA  9

Shane 2010-02-14 12:11:28

Like the plyr solution, since it can deal with the variables appearing in a different order for each observation. Call me paranoid, but I was worried about some observations not having some variables. Here is a variation that does not break even for very bad cases:x<-list(list(k1=2,k2=3),list(k2=100,k1=200),list(k1=5));ldply(x,function(z) as.data.frame(t(unlist(z)))

Jyotirmoy Bhattacharya 2010-02-14 13:54:37

I think a better solution is `ldply(x, data.frame)`

hadley 2010-02-14 14:42:10

I'd always choose the plyr solution :)

Ryan Rosario 2010-02-14 18:48:50

Brilliant. This is exactly what I want. Thanks, all.

jrk 2010-02-14 21:27:31

ansaurus

tags:

views:

answers:

Transposing JSON list-of-dictionaries for analysis in R

related questions