views:

154

answers:

8

I am working with OpenCV and Qt, Opencv use BGR while Qt uses RGB , so I have to swap those 2 bytes for very big images.

There is a better way of doing the following? I can not think of anything faster but looks so simple and lame...

    int width = iplImage->width;
 int height = iplImage->height;

 uchar *iplImagePtr = (uchar *) iplImage->imageData;
 uchar buf;
 int limit = height * width;

 for (int y = 0; y < limit; ++y) {
  buf = iplImagePtr[2];
  iplImagePtr[2] = iplImagePtr[0];
  iplImagePtr[0] = buf;
  iplImagePtr += 3;
 }

 QImage img((uchar *) iplImage->imageData, width, height,
     QImage::Format_RGB888);
+4  A: 

I think this looks absolutely fine. That the code is simple is not something negative. If you want to make it shorter you could use std::swap:

std::swap(iplImagePtr[0], iplImagePtr[2]);

You could also do the following:

 uchar* end = iplImagePtr + height * width * 3;
 for ( ; iplImagePtr != end; iplImagePtr += 3) {
    std::swap(iplImagePtr[0], iplImagePtr[2]);
 }
Andreas Brinck
+1  A: 

I would be inclined to do something like the following, working on the basis of that RGB data being in three byte blocks.

int i = 0;
int limit = (width * height); // / 3;
while(i != limit)
{
  buf = iplImagePtr[i]; // should be blue colour byte
  iplImagePtr[i] = iplImagaePtr[i + 2]; // save the red colour byte in the blue space
  iplImagePtr[i + 2] = buf; // save the blue color byte into what was the red slot
  // i++;
  i += 3;
}

I doubt it is any 'faster' but at end of day, you just have to go through the entire image, pixel by pixel.

thecoshman
Shouldn't that be `i += 3`?
Andreas Brinck
Thanks Andreas. it was either that, or always do (i * 3) which obliviously is just a waste of CPU
thecoshman
+2  A: 

There's cvConvertImage to do the whole thing in one line, but I doubt it's any faster either.

Johannes Sasongko
+2  A: 

Hey,

Couldn't you use one of the following methods ?

void QImage::invertPixels ( InvertMode mode = InvertRgb )

or

QImage QImage::rgbSwapped () const

Hope this helps a bit !

Andy M
+1  A: 

You could always do this:

int width = iplImage->width;
int height = iplImage->height;
uchar *start = (uchar *) iplImage->imageData;
uchar *end = start + width * height;

for (uchar *p = start ; p < end ; p += 3)
{
   uchar buf = *p;
   *p = *(p+2);
   *(p+2) = buf;
}

but a decent compiler would do this anyway.

Your biggest overhead in these sorts of operations is going to be memory bandwidth.

If you're using Windows then you can probably do this conversion using the BitBlt and two appropriately set up DIBs. If you're really lucky then this could be done in the graphics hardware.

Skizz
+4  A: 

We are currently dealing with this issue in a Qt application. We've found that the Intel Performance Primitives to be be fastest way to do this. They have extremely optimized code. In the html help files at Intel ippiSwapChannels Documentation they have an example of exactly what you are looking for.

There are couple of downsides

  1. Is the size of the library, but you can link static link just the library routines you need.
  2. Running on AMD cpus. Intel libs run VERY slow by default on AMD. Check out www.agner.org/optimize/asmlib.zip for details on how do a work around.
photo_tom
IPP is probably the best you're going to get, since it (IIRC) uses SSE instructions.
Mike DeSimone
Another downside is the price tag.
Johannes Sasongko
True about price, but at $200/developer, you get a lot of performance for not very much money. Also forgot to mention, they have optimized demo versions of image codecs to speed up image i/o.
photo_tom
A: 

I hate to ruin anyone's day, but if you don't want to go the IPP route (see photo_tom) or pull in an optimized library, you might get better performance from the following (modifying Andreas answer):

uchar *iplImagePtr = (uchar *) iplImage->imageData;
uchar buf;
size_t limit = height * width;

for (size_t y = 0; y < limit; ++y) {
    std::swap(iplImagePtr[y * 3], iplImagePtr[y * 3 + 2]);
}

Now hold on, folks, I hear you yelling "but all those extra multiplies and adds!" The thing is, this form of the loop is far easier for a compiler to optimize, especially if they get smart enough to multithread this sort of algorithm, because each pass through the loop is independent of those before or after. In the other form, the value of iplImagePtr was dependent on the value in previous pass. In this form, it is constant throughout the whole loop; only y changes, and that is in a very, very common "count from 0 to N-1" loop construct, so it's easier for an optimizer to digest.

Or maybe it doesn't make a difference these days because optimizers are insanely smart (are they?). I wonder what a benchmark would say...

P.S. If you actually benchmark this, I'd also like to see how well the following performs:

uchar *iplImagePtr = (uchar *) iplImage->imageData;
uchar buf;
size_t limit = height * width;

for (size_t y = 0; y < limit; ++y) {
    uchar *pixel = iplImagePtr + y * 3;
    std::swap(pix[0], pix[2]);
}

Again, pixel is defined in the loop to limit its scope and keep the optimizer from thinking there's a cycle-to-cycle dependency. If the compiler increments and decrements the stack pointer each time through the loop to "create" and "destroy" pixel, well, it's stupid and I'll apologize for wasting your time.

Mike DeSimone
A: 
cvCvtColor(iplImage, iplImage, CV_BGR2RGB);
Rich