tags:

views:

129

answers:

4

As in the example, I am trying to substring the Video_full column in a data.frame (video_data_2) I am working on. I want to keep all the characters after the period. The period is always present, there is only one period and it is in a different position in each value for the column.

     Date                     Video_full      Instances   
1 Apr 1, 2010  installs/AA.intro_video_1      546         
2 Apr 1, 2010  installs/ABAC.intro_video_2    548      

I got substring to work:

video_data_2$Video_full <- substring(video_data_2$Video_full,11)

And strsplit also:

strsplit("installs/AA.intro_video_1 ",'[.]')

I'm just not able to figure out how to start the substring in a dynamic position or only keep the second value returned by strsplit.

Thanks for any help you can offer for a simple question.

+5  A: 

you can use sub()

video_data_2$Video_full <- sub("^.*\\.","", video_data_2$Video_full)
kohske
thanks for your answer. worked like a charm.
analyticsPierce
+3  A: 

an approach using strsplit

video_data_2$Video_full <- sapply(strsplit(video_data_2$Video_full, "\\."),head)[2,]
gd047
Similar to the first answer provided by @Marek, I received a 'non-character argument' error when I tried this. Any thoughts on what might cause it?
analyticsPierce
+5  A: 

Another way to use strsplit

sapply(strsplit(video_data_2$Video_full, "\\."), "[", 2)

which is shorthand from

sapply(strsplit(video_data_2$Video_full, "\\."), function(x) x[2])
Marek
+1 I like very much the use of "[". What does it mean? (and where is the explanation in R help?)
gd047
@gd047 Indexing operator "[" is a function and you can reach its help by `?"["` (or `help("[")`). You could use it as any other function e.g.: `\`[\`(letters,3:5)`, but it's really helpful in cases like question or `do.call` or other places when you must directly provide name of function.
Marek
thank you for providing this answer. I am not sure why but when I ran this function I got a 'non-character argument' error. Any thoughts on what would cause that?
analyticsPierce
I suppose `video_data_2$Video_full` is a `factor`. So try `sapply(strsplit(as.character(video_data_2$Video_full), "\\."), "[", 2)`
Marek
+2  A: 

Try stringr

library(stringr)
str_split_fixed(video_data_2$Video_full, "\\.", n = 2)[, 2]
hadley
This solution is much slower than others. You can see this for 10,000 length vector.
Marek
Prove it! Plus why worry about speed unless you have to.
hadley
@hadley, thank you for your answer. I went through your docs for this package and would get a lot of use out of it. However, I was not able to get it to install. I'm using the Rbundle in textmate and tried install.packages("stringr", repos = "http://cran.r-project.org/src/contrib/stringr_0.3.tar.gz", type="source"), the message I got back said the package was unavailable. Sorry if this should be a separate question.
analyticsPierce
You should only need `install.packages("stringr")`. That path is not a valid repository.
hadley