views:

444

answers:

5

Hi all, I'm using Dipperstein's bitarray.cpp class to work on bi-level (black and white) images where the image data is natively stored as simply as one pixel one bit.

I need to iterate through each and every bit, on the order of 4--9 megapixels per image, over hundreds of images, using a for loop, something like:

for( int i = 0; i < imgLength; i++) {
    if( myBitArray[i] == 1 ) {
         //  ... do stuff ...
    }
}

Performance is usable, but not amazing. I run the program through gprof and find out there is significant time and millions of calls to std::vector methods like iterator and begin. Here's the top-sampled functions:

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 37.91      0.80     0.80        2     0.40     1.01  findPattern(bit_array_c*, bool*, int, int, int)
 12.32      1.06     0.26 98375762     0.00     0.00  __gnu_cxx::__normal_iterator<unsigned char const*, std::vector<unsigned char, std::allocator<unsigned char> > >::__normal_iterator(unsigned char const* const&)
 11.85      1.31     0.25 48183659     0.00     0.00  __gnu_cxx::__normal_iterator<unsigned char const*, std::vector<unsigned char, std::allocator<unsigned char> > >::operator+(int const&) const
 11.37      1.55     0.24 49187881     0.00     0.00  std::vector<unsigned char, std::allocator<unsigned char> >::begin() const
  9.24      1.75     0.20 48183659     0.00     0.00  bit_array_c::operator[](unsigned int) const
  8.06      1.92     0.17 48183659     0.00     0.00  std::vector<unsigned char, std::allocator<unsigned char> >::operator[](unsigned int) const
  5.21      2.02     0.11 48183659     0.00     0.00  __gnu_cxx::__normal_iterator<unsigned char const*, std::vector<unsigned char, std::allocator<unsigned char> > >::operator*() const
  0.95      2.04     0.02                             bit_array_c::operator()(unsigned int)
  0.47      2.06     0.01  6025316     0.00     0.00  __gnu_cxx::__normal_iterator<unsigned char*, std::vector<unsigned char, std::allocator<unsigned char> > >::__normal_iterator(unsigned char* const&)
  0.47      2.06     0.01  3012657     0.00     0.00  __gnu_cxx::__normal_iterator<unsigned char*, std::vector<unsigned char, std::allocator<unsigned char> > >::operator*() const
  0.47      2.08     0.01  1004222     0.00     0.00  std::vector<unsigned char, std::allocator<unsigned char> >::end() const
... remainder omitted ...

I'm not really familiar with C++'s STL, but can anyone shed light on why, for instance, std::vector::begin() is being called a few million times? And, of course, whether there's something I can be doing to speed it up?

Edit: I just gave up and optimized the search function (the loop) instead.

+1  A: 

Without seeing your code it's hard to make specific comments about how to speed up what you are doing. However, vector::begin() is used to return an iterator to the first element in a vector - it's a standard routine when iterating across a vector.

I'd actually recommend using a more modern profiler such as OProfile, this will give you much finer-grained information on where your program is spending it's time - down to the actual C++ line, or even the individual asm instruction, depending on how you run it.

As an aside - why did you choose to use bitarray.cpp instead of a vanilla std::vector<bool>? I've not used it myself, but a quick scan of the above link suggests that bitarray.cpp supports additional functionality over std::vector<bool>, which if you arn't making use of may well add overhead compared to the STL vector class...

Dave Rigby
17 of 26
From 17 of 26's article:"`vector<bool>` epitomizes the First Rule of Optimization: 'premature optimization is evil.' This optimization exacts a hefty toll: namely, a quirky interface, speed overhead, and incompatibility with the Standard Library. This optimization isn't optional; it's forced on programmers."Speed being the issue here, it doesn't look like that's a winning alternative.
erjiang
A: 

You could improve performance by using a pointer/iterator (I'm not sure what exactly bitarray.cpp does for you), like this:

for (bool *ptr = myBitArray, int i = 0; i != imgLength; ++i, ++ptr)
{
   if (*myBitArray == 1)
   {
       //handle
   }
}

I only use int i here because I'm not sure if your bit array would be null terminated, in which case your condition could simply be

*myBitArray != '\0';

Or you could hack out a better end condition. Using an std::iterator would be best, but I doubt your bitarray would support it.

Edit:

Normally this would be a micro-optimization, but if you loop over enough things, it will probably improve performance slightly.

Hooked
I think that myBitArray may be a `std::vector<char>` in which case similar advise would be to use iterators, not pointers.
Evan Teran
My post specifies iterators would be a better choice.
Hooked
A: 

If performance matters enough that you have to worry about accessing individual bits, then you should probably parallelize your code. Since you describe this as image processing, odds are that the state of bit i won't affect how you handle bits i+1 through i+6, so you can probably rewrite your code to operate on bytes and words at a time. Just being able to increment your counter 8 to 64 times less often should provide a measurable performance increase, and will also make it easier for the compiler to optimize your code.

+4  A: 

The fact that you see a lot of inline functions in your profile output implies that they aren't being inlined -- that is, you aren't compiling with optimization turned on. So the simplest thing you could do to optimize your code would be to use -O2 or -O3.

Profiling unoptimized code is rarely worthwhile, since the execution profile of optimized and unoptimized code will likely be completely different.33

Chris Dodd
+1  A: 

a quick peak into the code for bitarray.cpp shows:

bool bit_array_c::operator[](const unsigned int bit) const
{
    return((m_Array[BIT_CHAR(bit)] & BIT_IN_CHAR(bit)) != 0);
}

m_Array is of type std::vector

the [] operator on STL vectors is of constant complexity but its likely implemented as a call to vector::begin to get the base address of the array and then it calculates an offset to get to the value you want. since bitarray.cpp makes a call to the [] operator on EVERY BIT ACCESS you are getting a lot of calls.

given your use case i would create a custom implementation of the functionality contained in bitarray.cpp and tune it for your sequential, bit by bit, access pattern.

  • Don't use unsigned char's, use 32 or 64 bit values to reduce the number of memory accesses needed.
  • I would use a normal array, not a vector to avoid the look up overhead
  • Create a sequential access function, nextbit() that doesn't do all the look ups. Store a pointer to the current "value" all you need to do in increment it on the 32/64 bit boundary, all accesses between boundaries are a simple mask/shift operations and should be very fast.
Mark