tags:

views:

207

answers:

4

How to use gsub with more than 9 backreferences? I would expect the output in the example below to be "e, g, i, j, o".

> test <- "abcdefghijklmnop"
> gsub("(\\w)(\\w)(\\w)(\\w)(\\w)(\\w)(\\w)(\\w)(\\w)(\\w)(\\w)(\\w)(\\w)(\\w)(\\w)(\\w)", "\\5, \\7, \\9, \\10, \\15", test, perl = TRUE)
[1] "e, g, i, a0, a5"
A: 

It was my understanding that \10 would we understood as backreference 0 followed by a digit of 1. I think 9 is the max.

easement
A: 

According to this site, back references \10 to \99 works on some languages, but not most.

Those that are reported to work are

Rich Seller
+5  A: 

See Regular Expressions with The R Language:

You can use the backreferences \1 through \9 in the replacement text to reinsert text matched by a capturing group. There is no replacement text token for the overall match. Place the entire regex in a capturing group and then use \1.

But with PCRE you should be able to use named groups. So try (?P<name>regex) for groupd naming and (?P=name) as backreference.

Gumbo
+3  A: 

Use strsplit instead:

test <- "abcdefghijklmnop"
strsplit(test, "")[[1]][c(5, 7, 9, 10, 15)]
hadley