Unfortunately tail is relatively slow. Actually indexing the final item is much faster.
FUN <- function(x) {ss <- strsplit(x,' ')[[1]];ss[length(ss)]}
On my machine this is well over twice as fast as the tail command.
y <- c("AAAAAAAAAAA 250.00",
"01 JUN 2003 02 JUN 2002 OCTOPUS CARDS LTD HONG KONG HK 5.13",
"01 JUN 2003 02 JUN 2002 OCTOPUS CARDS LTD HONG KONG HK 834591283405347 50.00")
#make y bigger so that there's something to test
y <- rep(y, 1e5)
#testing tail
FUN <- function(x) {tail(strsplit(x,' ')[[1]],1)}
system.time( lapply(y,FUN) )
user system elapsed
22.108 0.110 22.069
#testing indexing
FUN <- function(x) {ss <- strsplit(x,' ')[[1]];ss[length(ss)]}
system.time( lapply(y,FUN) )
user system elapsed
9.396 0.037 9.372
But even more speed is accomplished by separating the function out and taking advantage of the fact that components are already vectorized. (the whole point of apply family commands is not to replace looping but to allow simple syntax and use vectorized commands as much as possible. The simplest functions possible should go into lapply and such.)
#first let strsplit do it's own vectory magic
s <- strsplit(y, ' ')
#then define a simpler function
FUN <- function(x) x[length(x)]
lapply(s, FUN)
To time test this it's necessary to keep the strsplit inside the timing routine to make it fair
system.time( {s <- strsplit(y, ' ');lapply(s, FUN)} )
user system elapsed
5.281 0.048 5.305
(I'm pretty sure I'm missing something on indexing lists and my function should be even simpler.)
One more thing though.. and this would have sped things up all the way through but I'll just add it here. strsplit() has a fixed option. It works much faster if you set that to true when you aren't using a regular expression.
system.time( {s <- strsplit(y, ' ', fixed = TRUE); lapply(s, FUN)} )
user system elapsed
1.256 0.007 1.253
If you're doing this on a large dataset or you have to do it frequently on even moderately sized datasets you really ought to be using this last method. It's nearly 20x faster.
Here's the final solution that can just be copied over to accomplish the whole task, assuming that Y is a vector of character strings formatted just as expected in Edit #3. What is expected is that the last item is a money value to save and the second last item is some kind of ID value.
s <- strsplit(y, ' ', fixed = TRUE)
moneyVal <- lapply(s, function(x) x[length(x)])
idVal <- lapply(s, function(x) x[length(x)-1])
restOfY <- lapply(s, function(x) paste(x[1:(length(x)-2)], collapse = ' '))
#These three values can be combined into a data frame
df <- data.frame(restOfY, idVal, moneyVal)