ansaurus

Question

Initializing a C++ vector to random values... fast

Answer 1

+2 A:

Yes, whereas x.at(i) does bounds checking, x[i] does not do so. Also, your code is incorrect as you have failed to specify the size of x in advance. You need to use std::vector<double> x(n), where n is the number of elements that you want to use; otherwise, your loop there will never execute.

Alternatively, you may want to make a custom iterator for generating random values and filling it using the iterator; because the std::vector constructor will initialize its elements, anyway, so if you have a custom iterator class that generates random values you may be able to eliminate a pass over the items.

In terms of implementing an iterator of your own, here is my untested code:

 class random_iterator
 {
     public:
         typedef std::input_iterator_tag iterator_category;
         typedef double value_type;
         typedef int difference_type;
         typedef double* pointer;
         typedef double& reference;

         random_iterator() : _range(1.0), _count(0) {}
         random_iterator(double range, int count) : 
                                         _range(range), _count(count) {}
         random_iterator(const random_iterator& o) : 
                                         _range(o._range), _count(o._count) {}
         ~random_iterator(){}

         double operator*()const{ return ((rand()/(double)RAND_MAX) * _range); }
         int operator-(const random_iterator& o)const{ return o._count-_count; }
         random_iterator& operator++(){ _count--; return *this; }
         random_iterator operator++(int){ random_iterator cpy(*this); _count--; return cpy; }
         bool operator==(const random_iterator& o)const{ return _count==o._count; }
         bool operator!=(const random_iterator& o)const{ return _count!=o._count; }

     private:
         double _range;
         int _count;
 };

With the code above, it should be possible to use:

std::vector<double> x(random_iterator(range,number),random_iterator());

That said, the generate code for the other solution given is simpler, and frankly, I would just explicitly fill the vector without resorting to anything fancy like this.... but it is kind of cool to think about.

Michael Aaron Safyan 2010-05-18 20:04:07

Okay. The idea of an iterator creating random values is just plain cool.

Jonathan M Davis 2010-05-18 20:08:36

Oh, sorry I just wrote it here, Its more for the concept but I will correct it, ty. Could you give me an example or direct me to where I can read about making my own iterator? I'm pretty new to C++. Thanks

Flamewires 2010-05-18 20:09:14

@Flamewires, yeah... you would need to create a class that conforms to one of the iterator concepts. See: http://www.sgi.com/tech/stl/Iterators.html . If you are pretty new to C++, then creating your own iterator might not be such a great idea at the moment, as it involves notions such as "type traits", "templates", "template specialization", "concepts", and "operator overloading". If you are familiar with those, then it might be a reasonable thing to do.

Michael Aaron Safyan 2010-05-18 20:21:37

@Michael: IMO, it's a rather poor idea in any case. While it's *possible* to think of generating (pseudo-)random numbers as iterating over a sequence, I think it's more likely to be misleading than helpful.

Jerry Coffin 2010-05-18 20:25:39

@Jerry: I don't know. Such an iterator doesn't seem all that far a step from using input iterators on `/dev/random`, which is a well-known and established C++ icon (which only fails in this case because reading `/dev/random` never reaches EOF).

sbi 2010-05-18 22:02:04

@sbi:While it's used (and even useful) I can't say I'm really much more enthused about `/dev/random`, and for pretty much the same reasons. As far as an iterator goes, I'm (somewhat) less bothered by reading from a device that generates data than having an iterator that generates data itself (though I'll admit, it depends -- for a counterexample, see: http://groups.google.com/group/comp.lang.c++/browse_frm/thread/595c76ff16787e64/c7c74eec40e9a812#c7c74eec40e9a812). In that case, we don't have an actual sequence, but it's something that acts like we do.

Jerry Coffin 2010-05-18 22:29:53

Answer 2

+6 A:

~~Right now, this should be really fast since the loop won't execute.~~

Personally, I'd probably use something like this:

struct gen_rand { 
    double range;
public:
    gen(double r) : range(r) {}
    double operator()() { 
        return (rand()/(double)RAND_MAX) * range;
    }
};

std::vector<double> x(num_items);
std::generate_n(x.begin(), num_items, gen_rand());

Edit: It's purely a micro-optimization that might make no difference at all, but you might consider rearranging the computation to get something like:

struct gen_rand { 
    double factor;
public:
    gen(double r) : factor(range/RAND_MAX) {}
    double operator()() { 
        return rand() * factor;
    }
};

Of course, there's a really good chance the compiler will already do this (or something equivalent) but it won't hurt to try it anyway (though it's really only likely to help with optimization turned off).

Edit2: "sbi" (as is usually the case) is right: you might gain a bit by initially reserving space, then using an insert iterator to put the data into place:

std::vector<double> x;
x.reserve(num_items);
std::generate_n(std::back_inserter(x), num_items, gen_rand());

As before, we're into such microscopic optimization, I'm not at all sure I'd really expect to see a difference at all. In particular, since this is all done with templates, there's a pretty good chance most (if not all) the code will be generated inline. In that case, the optimizer is likely to notice that the initial data all gets overwritten, and skip initializing it.

In the end, however, nearly the only part that's really likely to make a significant difference is getting rid of the .at(i). The others might, but with optimizations turned on, I wouldn't really expect them to.

Jerry Coffin 2010-05-18 20:08:57

No fair, I saw it first. I'm just too slow, you beat me to it.

wheaties 2010-05-18 20:11:10

This still pre-initializes the vector with `0.0` values which are then overwritten. Wouldn't a `reserve()` and and insert iterator eliminate that?

sbi 2010-05-18 20:13:03

@Wheaties: that must be a first -- with my (poor) typing, it's usually the other way around! :-)

Jerry Coffin 2010-05-18 20:29:55

Jerry, I phrased what I said as a question, because I'm not sure. I can imagine the additional overhead of an inserter iterator taking longer than the initial initialization of a few built-ins, especially if not all of the former can be inlined and the latter is optimized to a `std::memset()`, implemented as a CPU intrinsic. I guess in the end you just have to measure. __But unless the app spends 20% of its time in that loop, nobody is going to notice anyway even if you double the speed of it.__

sbi 2010-05-20 08:23:45

@sbi: that's why I said it was a micro-optimization in my edit, and there was no certainty it would do any good. Nonetheless, what I'd guess in this case would be the iterator being a pointer in a register, so incrementing it would have virtually no overhead. Unless the initialization is optimized away completely, it involves writing to memory, and you can increment a register a *lot* before it's as expensive as even one memory transaction. In the end, however, you're right: only profiling will tell for sure and it's rarely likely to matter anyway.

Jerry Coffin 2010-05-20 10:46:33

@Jerry: I didn't mean to criticize your answer (I had up-voted it), I merely meant to complement it. I also think that `reserve()` and `push_back()` might be faster than `resize()` and `operator[]()` (I suggested this, after all). However, if such micro-optimizations would seem worth the work, one definitely needs to measure. But this is were we agree, so please excuse me for repeating it...

sbi 2010-05-20 12:00:29

@sbi: I think we're in agreement in general. At least IMO, there's no need to excuse recommending measurement -- it's well worth repeating.

Jerry Coffin 2010-05-20 13:07:03

Answer 3

+2 A:

You may consider using a pseudo-random number generator that gives output as a sequence. Since most PRNGs just provide a sequence anyways, that will be a lot more efficient than simply calling rand() over and over again.

But then, I think I really need to know more about your situation.

Why does this piece of code execute so much? Can you restructure your code to avoid re-generating random data so frequently?
How big are your vectors?
How "good" does your random number generator need to be? High-quality distributions tend to be more expensive to calculate.
If your vectors are large, are you reusing their buffer space, or are you throwing it away and reallocating it elsewhere? Creating new vectors willy-nilly is a great way to destroy your cache.

Tom 2010-05-18 20:10:35

Answer 4

+1 A:

#include <iostream>
#include <vector>
#include <algorithm>

struct functor {
   functor(double v):val(v) {}
   double operator()() const {
      return (rand()/(double)RAND_MAX)*val;
   }
private:
   double val;
};

int main(int argc, const char** argv) {
   const int size = 10;
   const double range = 3.0f;

   std::vector<double> dvec;
   std::generate_n(std::back_inserter(dvec), size, functor(range));

   // print all
   std::copy(dvec.begin(), dvec.end(), (std::ostream_iterator<double>(std::cout, "\n")));

   return 0;
}

опоздал :(

niXman 2010-05-18 20:16:08

Answer 5

+1 A:

@Jerry Coffin's answer looks very good. Two other thoughts, though:

Inlining - All of your vector access will be very fast, but if the call to rand() is out-of-line, the function call overhead might dominate. If that's the case, you may need to roll your own pseudorandom number generator.
SIMD - If you're going to roll your own PRNG, you might as well make it compute 2 doubles (or 4 floats) at once. This will reduce the number of the int-to-float conversions as well as the multiplications. I've never tried it, but apparently there's a SIMD version of the Mersenne Twister that's quite good. A simple linear congruential generator might be good enough too (and that's probably what rand() is using already).

celion 2010-05-18 23:50:54

Answer 6

A:

The way I think about these is a rubber-meets-the-road approach.
In other words, there are certain minimal things that have to happen, no getting around it, such as:

the rand() function has to be called N times.
the result of rand() has to be converted to double and then multiplied by something.
the resulting numbers have to get stored in consecutive elements of an array.

The object is, at a minimum, to get those things done.

Other concerns, like whether or not to use an std::vector and iterators are fine as long as they don't add any extra cycles. The easiest way to see if they add significant extra cycles is to single-step the code at the assembly language level.

Mike Dunlavey 2010-05-19 00:37:32

ansaurus

tags:

views:

answers:

Initializing a C++ vector to random values... fast

related questions