ansaurus

Question

R-thonic replacement for simple for loops containing a condition

Answer 1

+2 A:

It's usually better in R if you use the various apply-like functions, rather than a loop. I think this solves your problem; the only drawback is that you have to use string keys.

> descriptions <- c("foo//bar", "")
> probes <- c(10, 20)
> probe2gene <- lapply(strsplit(descriptions, "//"), function (x) x[2])
> names(probe2gene) <- probes
> probe2gene <- probe2gene[!is.na(probe2gene)]
> probe2gene[["10"]]
[1] "bar"

Unfortunately, R doesn't have a good dictionary/map type. The closest I've found is using lists as a map from string-to-value. That seems to be idiomatic, but it's ugly.

Johann Hibschman 2010-02-10 20:08:23

Thanks! That is a lot faster. Had realised things like "strsplit" could be applied to whole vectors. Neat!

Mike Dewar 2010-02-10 21:09:45

Answer 2

+1 A:

If I understand correctly you are looking to save each probe-description combination where the there is more than one (split) value in description?

Probe and Description are the same length?

This is kind of messy but a quick first pass at it?

a <- list("a","b","c")
b <- list(c("a","b"),c("DEF","ABC"),c("Z"))

names(b) <- a
matches <- which(lapply(b, length)>1) #several ways to do this
b <- lapply(b[matches], function(x) x[2]) #keeps the second element only

That's my first attempt. If you have a sample dataset that would be very useful.

Best regards,

Jay

Jay 2010-02-10 20:21:54

It's hard to be the first responder ;)

Jay 2010-02-10 20:44:34

Answer 3

A:

Another way.

probe<-c(4,3,1)
gene<-c('red//hair','strange','blue//blood')
probe2gene<-character()
probe2gene[probe]<-sapply(strsplit(gene,'//'),'[',2)
probe2gene
[1] "blood" NA      NA      "hair"

In the sapply, we take advantage of the fact that in R the subsetting operator is also a function named '[' to which we can pass the index as an argument. Also, an out-of-range index does not cause an error but gives a NA value. On the left hand of the same line, we use the fact that we can pass a vector of indices in any order and with gaps.

Jyotirmoy Bhattacharya 2010-02-11 05:59:10

Answer 4

A:

Here's another approach that should be fast. Note that this doesn't remove the empty descriptions. It could be adapted to do that or you could clean those in a post processing step using lapply. Is it the case that you'll never have a valid description of length one?

make_desc <- function(n)
{
    word <- function(x) paste(sample(letters, 5, replace=TRUE), collapse = "")
    if (runif(1) < 0.70)
        paste(sapply(seq_len(n), word), collapse = "//")
    else
        "----"
}

description <- sapply(seq_len(10), make_desc)
probes <- seq_len(length(description))

desc_parts <- strsplit(description, "//", fixed=TRUE, useBytes=TRUE)
lens <- sapply(desc_parts, length)
probes_expand <- rep(probes, lens)
ans <- split(unlist(desc_parts), probes_expand)


> description
 [1] "fmbec"                                                               
 [2] "----"                                                                
 [3] "----"                                                                
 [4] "frrii//yjxsa//wvkce//xbpkc"                                          
 [5] "kazzp//ifrlz//ztnkh//dtwow//aqvcm"                                   
 [6] "stupm//ncqhx//zaakn//kjymf//swvsr//zsexu"                            
 [7] "wajit//sajgr//cttzf//uagwy//qtuyh//iyiue//xelrq"                     
 [8] "nirex//awvnw//bvexw//mmzdp//lvetr//xvahy//qhgym//ggdax"              
 [9] "----"                                                                
[10] "ubabx//tvqrd//vcxsp//rjshu//gbmvj//fbkea//smrgm//qfmpy//tpudu//qpjbu"


> ans[[3]]
[1] "----"
> ans[[4]]
[1] "frrii" "yjxsa" "wvkce" "xbpkc"

seth 2010-02-11 21:51:14

ansaurus

tags:

views:

answers:

R-thonic replacement for simple for loops containing a condition

related questions