ansaurus

Question

How can you efficiently check values of large vectors in R?

Answer 1

A:

which(is.na(my_big_vector))
which(my_big_vector == 5)
which(my_big_vector < 3)

And if you want to count them...

length(which(is.na(my_big_vector)))

nico 2010-07-01 19:01:23

This is not good answer since is.na produces bool vector...

mbq 2010-07-01 19:29:19

Answer 2

+3 A:

"Cheap, fast, reliable: pick any two" is a dry way of saying that you sometimes need to order your priorities when building or designing systems.

It is rather similar here: the cost of the concise expression is the fact that memory gets allocated behind the scenes. If that really is a problem, then you can always write a (compiled ?) routines to runs (quickly) along the vectors and uses only pair of values at a time.

You can trade off memory usage versus performance versus expressiveness, but is difficult to hit all three at the same time.

Dirk Eddelbuettel 2010-07-01 19:05:45

It would seem like the built in library ought to have functions that could tell you whether any value in a big vector was NA or equal to some value. In Python, you could use generator comprehensions which would allocate a fixed amount of memory and short circuit the computation of any() or all().

Nick 2010-07-08 00:36:21

Answer 3

A:

I think it is not a good idea -- R is a very high-level language, so what you should do is to follow standards. This way R developers know what to optimize. You should also remember that while R is functional and lazy language, it is even possible that statement like

any(is.na(a))

can be recognized and executed as something like

.Internal(is_any_na,a)

mbq 2010-07-01 19:38:11

ansaurus

tags:

views:

answers:

How can you efficiently check values of large vectors in R?

related questions