ansaurus

Question

How to avoid a loop in R: selecting items from a list

Answer 1

+3 A:

You almost had it. It really is just a matter of

using one of the *apply functions to loop over your existing list, I often start with lapply and sometimes switch to sapply
add an anonymous function that operates on one of the list elements at a time
you already knew it was strsplit(string, splitterm) and that you need the odd [[1]][1] to pick off the first term of the answer
just put it all together, starting with a preferred variable namne (as we stay clear of t or c and friends)

which gives

> tlist <- c("bob_smith","mary_jane","jose_chung","michael_marx","charlie_ivan") 
> fnames <- sapply(tlist, function(x) strsplit(x, "_")[[1]][1]) 
> fnames 
  bob_smith    mary_jane   jose_chung michael_marx charlie_ivan   
      "bob"       "mary"       "jose"    "michael"    "charlie" 
>

Dirk Eddelbuettel 2009-08-31 01:09:33

I really have struggled with getting my mind around properly using the apply functions in R. Some days it feels like learning to drive on the opposite side of the road.. it's really not hard but the simple round-a-bouts result in a mental log jam.

JD Long 2009-09-02 14:51:35

I do it in a leg-alike fashion. You knew strsplit. You knew you needed an 'anon function' of one parameter for the apply family. Just stick'em together.... Lastly, and not to nit-pick, I posted this before the essentially identical but less verbose answer you accepted as 'the' answer.

Dirk Eddelbuettel 2009-09-02 15:53:05

Typo: 'lego-alike', not 'leg-alike'

Dirk Eddelbuettel 2009-09-02 15:53:44

Dirk, one of the things I have noticed about being a novice at R is that it is very hard to see that two given problems are similar. I think with expertise comes the ability to chose meaningful analogies quickly. I'm slowly getting to where I can see patterns. I appreciate your comment above about figuring out what the lego bricks are. I'm still growing in my ability to look at a problem and see that I need an anon function, for example.

JD Long 2009-09-09 15:57:53

Answer 2

+2 A:

You could use unlist():

> tsplit <- unlist(strsplit(t,"_"))
> tsplit
 [1] "bob"     "smith"   "mary"    "jane"    "jose"    "chung"   "michael"
 [8] "marx"    "charlie" "ivan"   
> t_out <- tsplit[seq(1, length(tsplit), by = 2)]
> t_out
[1] "bob"     "mary"    "jose"    "michael" "charlie"

There might be a better way to pull out only the odd-indexed entries, but in any case you won't have a loop.

brentonk 2009-08-31 01:10:16

Not ideal as you need impose the 'by = 2' to pick the matching elements.

Dirk Eddelbuettel 2009-08-31 01:32:24

Answer 3

+3 A:

I doubt this is the most elegant solution, but it beats looping:

t.df <- data.frame(tsplit)
t.df[1, ]

Converting lists to data frames is about the only way I can get them to do what I want. I'm looking forward to reading answers by people who actually understand how to handle lists.

Matt Parker 2009-08-31 01:12:14

I like this. I 'get' the data.frame structure. And since my real data has the same number of items in each "name" then this should not be less memory efficient. Why didn't I think of this!

JD Long 2009-08-31 01:37:14

Note that this approach takes a hell of a long time with larger data - see my comment on William Doane's answer.

Matt Parker 2009-08-31 03:24:31

Answer 4

+5 A:

You can use apply (or sapply)

t <- c("bob_smith","mary_jane","jose_chung","michael_marx","charlie_ivan")

f <- function(s) strsplit(s, "_")[[1]][1]

sapply(t, f)

bob_smith mary_jane jose_chung michael_marx charlie_ivan

   "bob"       "mary"       "jose"    "michael"    "charlie"

David

liebke 2009-08-31 01:16:25

that is exactly what I was trying to do. thank you. And welcome to Stack Overflow. I've enjoyed reading your blog.

JD Long 2009-08-31 01:40:42

Thanks, I enjoy your blog (and tweets) too.

liebke 2009-08-31 01:53:14

Answer 5

+7 A:

How about:

tlist <- c("bob_smith","mary_jane","jose_chung","michael_marx","charlie_ivan")
fnames <- gsub("(_.*)$", "", tlist)
# _.* matches the underscore followed by a string of characters
# the $ anchors the search at the end of the input string
# so, underscore followed by a string of characters followed by the end of the input string

for the RegEx approach?

William Doane 2009-08-31 02:33:51

+1 for being the fastest. With rep(t, 1e4), my approach took 83.23 seconds (81.41 of which were spent converting to a data frame!), David's took 4.39s, and yours took 0.81. I think it has the best output, too.

Matt Parker 2009-08-31 03:23:31

Thanks, Matt... I was wondering about the efficiency of each of these solutions!

William Doane 2009-08-31 03:31:00

that's really informative. I had just assumed the strsplit bit was a given. Wow. Good to see another way of doing it.

JD Long 2009-08-31 03:49:35

Answer 6

+2 A:

And one other approach, based on brentonk's unlist example...

tlist <- c("bob_smith","mary_jane","jose_chung","michael_marx","charlie_ivan")
tsplit <- unlist(strsplit(tlist,"_"))
fnames <- tsplit[seq(1:length(tsplit))%%2 == 1]

William Doane 2009-08-31 02:56:25

Answer 7

+9 A:

And one more approach:

t <- c("bob_smith","mary_jane","jose_chung","michael_marx","charlie_ivan")
pieces <- strsplit(t,"_")
sapply(pieces, "[", 1)

In words, the last line extracts the first element of each component of the list and then simplifies it into a vector.

How does this work? Well, you need to realise an alternative way of writing x[1] is "["(x, 1), i.e. there is a function called [ that does subsetting. The sapply call applies calls this function once for each element of the original list, passing in two arguments, the list element and 1.

The advantage of this approach over the others is that you can extract multiple elements from the list without having to recompute the splits. For example, the last name would be sapply(pieces, "[", 2). Once you get used to this idiom, it's pretty easy to read.

hadley 2009-08-31 03:20:05

Hadley, I see this works, but I haven't the slightest idea why it works. Is there an implied "]" somehow? Can you elaborate a bit? My R-foo is clearly weak.

JD Long 2009-08-31 05:01:58

I was a little shocked by this, too, JD... so after a little playing, I see that: > "["(pieces,1) yields [[1]] [1] "bob" "smith" ... an interesting notation, to be sure, and very useful!

William Doane 2009-08-31 15:34:25

Just as a side note, if you are going to split on fixed strings instead of regexps, you might want to consider passing `fixed=TRUE` to `strsplit`. I've found that this can have a large impact on the speed of `strsplit`.

Jonathan Chang 2009-08-31 19:46:28

All operators in R are functions - infix operators can be written in prefix notation. TRUE || FALSE can be written as `||`(TRUE,FALSE), a[b] can be written as `[`(a,b), and even assignment operators a[b] <- TRUE is `[<-`(a,b,value=TRUE). R is magic.

Stephen 2009-09-01 05:09:22

Not sure if it came out correctly there but there should be quotes (I used backtick but regular quotes should also work) around the prefix functions.

Stephen 2009-09-01 05:10:14

thanks for posting an explanation. That makes sense to me now. The [ function was totally new to me.

JD Long 2009-09-02 14:49:18

I love that this works, and I love Stephen's comment "R is magic". It's so true !

PaulHurleyuk 2010-04-08 12:00:39

Answer 8

+3 A:

what about:

t <- c("bob_smith","mary_jane","jose_chung","michael_marx","charlie_ivan")

sub("_.*", "", t)

Karsten 2010-01-22 18:29:22

that totally works! Thanks.

JD Long 2010-01-26 21:22:20

ansaurus

tags:

views:

answers:

How to avoid a loop in R: selecting items from a list

related questions