tags:

views:

102

answers:

2

Hi,

it's me and the lists again.

I have a nice list, which looks like this:

tmp = NULL
t = NULL
tmp$resultitem$count = "1057230"
tmp$resultitem$status = "Ok"
tmp$resultitem$menu = "PubMed"
tmp$resultitem$dbname = "pubmed"
t$resultitem$count = "305215"
t$resultitem$status = "Ok"
t$resultitem$menu = "PMC"
t$resultitem$dbname = "pmc"
tmp = c(tmp, t)
t = NULL
t$resultitem$count = "1"
t$resultitem$status = "Ok"
t$resultitem$menu = "Journals"
t$resultitem$dbname = "journals"
tmp = c(tmp, t)

Which produces:

> str(tmp)
List of 3
 $ resultitem:List of 4
  ..$ count : chr "1057230"
  ..$ status: chr "Ok"
  ..$ menu  : chr "PubMed"
  ..$ dbname: chr "pubmed"
 $ resultitem:List of 4
  ..$ count : chr "305215"
  ..$ status: chr "Ok"
  ..$ menu  : chr "PMC"
  ..$ dbname: chr "pmc"
 $ resultitem:List of 4
  ..$ count : chr "1"
  ..$ status: chr "Ok"
  ..$ menu  : chr "Journals"
  ..$ dbname: chr "journals"

Now I want to search through the elements of each "resultitem". I want to know the "dbname" for every database, that has less then 10 "count" (example). In this case it is very easy, as this list only has 3 elements, but the real list is a little bit longer.

This could be simply done with a for loop. But is there a way to do this with some other function of R (like rapply)? My problem with those apply functions is, that they only look at one element.

If I do a grep to get all "dbname" elements, I can not get the count of each element.

rapply(tmp, function(x) paste("Content: ", x))[grep("dbname", names(rapply(tmp, c)))]

Does someone has a better idea than a for loop?

Thanx,
Martin

+5  A: 

R generally wants to handle these things as data.frames, so I think your best bet is to turn your list into one (or even make a data.frame instead of a list to begin with, unless you need it to be in list form).

x <- do.call(rbind,tmp)
dat <- data.frame(x)
dat$count <- as.numeric(dat$count)

> dat
    count status     menu   dbname
1 1057230     Ok   PubMed   pubmed
2  305215     Ok      PMC      pmc
3       1     Ok Journals journals

and then to get your answer(s) you can use normal data.frame subsetting operations:

> dat$dbname[dat$count<10]
$resultitem
[1] "journals"
Fojtasek
This data.frame isn't proper data.frame. Each column is a list. It will be ok if you do `x<-do.call(rbind, lapply(tmp,unlist))` and then `dat<-data.frame(x,stringsAsFactors=FALSE,row.names=NULL)`.
Marek
I noticed the issue with the row names and the columns being lists, but wasn't immediately sure what to do about them. Nice fix.
Fojtasek
This works perfectly for my example, thanx. But the problem with dataframes is, that they don't support columns with different lengths. And I have some other lists, where this will be the case. So I'm bound to lists.
Martin
But if they are very similar to what you've shown and you want to do similar actions then you're still much better off organizing them as data frames with NA's to equalize the columns for missing data. If you truly have ragged lists that can't be a data frame then the kind of question you asked isn't really sensible. You can't ask about the count being less than 10 with no count field. Therefore, for all data you need to ask this kind of question of you can use data frames and make your life much easier.
John
+1  A: 

If you're absolutely insistent that you must do this in a list the following will work for the present case.

x <- tmp[sapply(tmp, function(x){x$count>10})]
str(x)
(the list items you wanted)

More generally, if you would like to actually use ragged lists in this way you could use the same code but check for the presence of the item first

testForCount <- function(x) {if ('count' %in% names(x)) x$count>10 else FALSE}
tmp[sapply (tmp, count)]

This will work for your cases where the lists are not the same length as well as the present case. (I still think you should be using data frames for both speed and sensible representation of the data).

John
The problem with my data is that it comes from a webservice. And it is not certain that a column exists. If the webservice changes the R package won't work anymore. Even if the query changes the columns might not be the same as before. So I decided to use lists as representation of the results. And now I'm looking for some ways to handle these lists. You helped me a lot, thank you.
Martin
I think you're saying that you can't be sure the cell exists in the particular query. That's fine, just NA that cell. If the column doesn't exist at all then that's just a different data frame and you'd have to adjust your code anyway. I'm not trying to make your life difficult. We're all on here trying to make it easier for you. Nothing you've said precludes a data frame. Aside from all of that, given that you're keen on sticking to lists, you should mark mine as the correct answer. :)
John