tags:

views:

85

answers:

3

One thing I want to do all the time in my R code is to test whether certain conditions hold for a vector, such as whether it contains any or all values equal to some specified value. The Rish way to do this is to create a boolean vector and use any or all, for example:

any(is.na(my_big_vector))
all(my_big_vector == my_big_vector[[1]])
...

It seems really inefficient to me to allocate a big vector and fill it with values just to throw it away (especially if any() or all() call can be short circuited after testing just a couple of the values. Is there a better way to do this or should I just hand in my desire to write code that is both efficient and succinct when working in R?

A: 
which(is.na(my_big_vector))
which(my_big_vector == 5)
which(my_big_vector < 3)

And if you want to count them...

length(which(is.na(my_big_vector)))
nico
This is not good answer since is.na produces bool vector...
mbq
+3  A: 

"Cheap, fast, reliable: pick any two" is a dry way of saying that you sometimes need to order your priorities when building or designing systems.

It is rather similar here: the cost of the concise expression is the fact that memory gets allocated behind the scenes. If that really is a problem, then you can always write a (compiled ?) routines to runs (quickly) along the vectors and uses only pair of values at a time.

You can trade off memory usage versus performance versus expressiveness, but is difficult to hit all three at the same time.

Dirk Eddelbuettel
It would seem like the built in library ought to have functions that could tell you whether any value in a big vector was NA or equal to some value. In Python, you could use generator comprehensions which would allocate a fixed amount of memory and short circuit the computation of any() or all().
Nick
A: 

I think it is not a good idea -- R is a very high-level language, so what you should do is to follow standards. This way R developers know what to optimize. You should also remember that while R is functional and lazy language, it is even possible that statement like

any(is.na(a))

can be recognized and executed as something like

.Internal(is_any_na,a)
mbq