tags:

views:

79

answers:

5

Dear all, Im trying to split a string on "." and create additional columns with the two strings before and after ".".

tes<-c("1.abc","2.di","3.lik")
dat<-c(5,3,2)
h<-data.frame(tes,dat)
h$num<-substr(h$tes,1,1)

h$prim<-unlist(strsplit(as.character(h$tes),"\\."))[2]
h$prim<-sapply(h$tes,unlist(strsplit(as.character(h$tes),"\\."))[2])

I´d like h$prim to contain "abc","di","lik"..However I´m not able to figure it out. I guess strsplit is not vectorised, but then I thought the sapply version should have worked. However I assume it should be easy:-)

Regards, //M

+5  A: 

This should do the trick

R> sapply(strsplit(as.character(h$tes), "\\."), "[[", 2)
[1] "abc" "di"  "lik"
rcs
allright.. However not as easy as I thought.. What is this "[[" thing?
Misha
@Misha : the tricky thing about strsplit is that it returns a list. That "[[" thing is the function to extract from that list. 2 is the argument for that function, meaning that it takes the second element of the lists returned by strsplit. See also ?"[[" in R. and thx @rcs, that's clever!
Joris Meys
It's a indexing operator. "[[" can be used to select a single element dropping names, see `?Extract`. You could also use "[".
rcs
Misha
Because the second argument is not a function, it's just the string "abc"
rcs
@Misha See also my comments to my answer [in very similar question](http://stackoverflow.com/questions/3003527/how-do-i-specify-a-dynamic-position-for-the-start-of-substring/3004225#3004225)
Marek
It could be also call with `fixed` parameter like `strsplit(as.character(h$tes), ".", fixed=TRUE)`. In case of long vectors should be significantly faster.
Marek
+2  A: 

This is the same as rcs' answer, but may be easier to understand:

> sapply(strsplit(as.character(h$tes), "\\."), function(x) x[[2]])
[1] "abc" "di"  "lik"
Joshua Ulrich
@all of you... Now I get it. //M
Misha
+2  A: 

With the stringr package it's even easier:

library(stringr)
str_split_fixed(h$tes, fixed("."), 2)[, 2]
hadley
+1  A: 

This question appears several time on StackOverflow.

In exact form as yours:

Some similar question in this topic:

And if you care about speed then you should consider tip from John answer about fixed parameter to strsplit.

Marek
+1  A: 

Alternatively, you can save yourself the work of pulling out the 2nd element if you add both columns at the same time:

tes <- c("1.abc","2.di","3.lik")
dat <- c(5,3,2)
h <- data.frame(tes, dat, stringsAsFactors=FALSE)
values <- unlist(strsplit(h$tes, ".", fixed=TRUE))
h <- cbind(h, matrix(values, byrow=TRUE, ncol=2,
                     dimnames=list(NULL, c("num", "prim"))))
David F