views:

80

answers:

3

Is there a significant penalty in using the more user-friendly valarray over built-in arrays? Do you know of some benchmarks?

A: 

I don't believe there is any overhead over a built in array. Of course though, that's implementation defined. I suspect that if any language feature like that is the bottleneck in your application, you have a problem a lot of us would like to have :)

However, I'm curious why you'd want to do that instead of using a vector?

Billy ONeal
Just figuring out if it would be suitable for some number crunching
Sergio
+4  A: 

valarray was intended to improve the chances of getting good optimization. For better or worse, it's become something of the forgotten step-child of the standard library; I don't know of any implementations that do much to take advantage of what it provides. Worse, the design was really designed for vector processors, and doesn't work very well with processors that use caching heavily -- which nearly everything anymore.

I don't know of any really serious benchmarks, but in my (admittedly, quite informal) testing, it's about even with a built-in array (or std::vector) as long as you're dealing with small amounts of data, but if you have enough data that it doesn't all fit in the cache, reasonably careful use of a built-in array or std::vector will usually be faster.

As far as "reasonably careful", it comes down to this: to make things cache friendly, you generally want to read a specific piece of data, do all the processing on it that you're going to, then write it back out. valarray does pretty much the opposite: it applies a single operation to the entire array, then applies the next operation to the whole array, and so on 'til you're done.

On a Cray (for example) that worked well -- it had three sets of 64 registers that you (ideally) rotated, so in any given clock cycle, you were reading from memory into one set of 64 registers, applying a single operation to a second set, and writing a third set back out. The next cycle you'd "rotate" those, so you did the operation on the 64 operations you just read, wrote out the results you just created, and read data into the registers you just wrote out. When things worked out, you got a 64x speedup compared scalar processing. Most current processors have some vector processing, but only for around 2 to 4 operands per clock cycle -- at the same time, they're usually limited primarily by bandwidth to main memory, which is exactly what that pattern needed the most of.

Jerry Coffin
+1  A: 

Jerry is right, for large (larger than cache) arrays. The valarray provides basic operations in addition to just wrapping new[], but they won't produce the most efficient code.

I've never needed to use such a system, but if you do want efficient and maintainable algebra for large matrices, expression templates are the answer.

A Google search turned up http://met.sourceforge.net/, which has links to some other resources.

Potatoswatter