ansaurus

Question

Answer 1

+2 A:

What are the dimensions of the image and the kernel ? If the kernel is large then you can use FFT-based convolution, otherwise for small kernels just use direct convolution.

The DSP might not be the best way to do this though - just because it has a MAC instruction doesn't mean that it will be more efficient. Does the ARM CPU on the Beagle Board have NEON SIMD ? If so then that might be the way to go (and more fun too).

For a small kernel, you can do direct convolution like this:

// in, out are m x n images (integer data)
// K is the kernel size (KxK) - currently needs to be an odd number, e.g. 3
// coeffs[K][K] is a 2D array of integer coefficients
// scale is a scaling factor to normalise the filter gain

for (i = K / 2; i < m - K / 2; ++i) // iterate through image
{
  for (j = K / 2; j < n - K / 2; ++j)
  {
    int sum = 0; // sum will be the sum of input data * coeff terms

    for (ii = - K / 2; ii <= K / 2; ++ii) // iterate over kernel
    {
      for (jj = - K / 2; jj <= K / 2; ++jj)
      {
        int data = in[i + ii][j +jj];
        int coeff = coeffs[ii + K / 2][jj + K / 2];

        sum += data * coeff;
      }
    }
    out[i][j] = sum / scale; // scale sum of convolution products and store in output
  }
}

You can modify this to support even values of K - it just takes a little care with the upper/lower limits on the two inner loops.

Paul R 2010-10-20 21:42:48

@Paul: I'd imagine that for straight-up convolutions, the TI would beat the ARM, given modulo addressing modes and zero-overhead loops and so on.

Oli Charlesworth 2010-10-20 21:45:14

@Oli: you may be right - I'm not sure that modulo addressing helps, but there may be other advantages.

Paul R 2010-10-20 21:48:15

@MPaul: Modulo addressing helps if you're zipping through a circular buffer. I have no numbers to back this up, but if the TI can't beat the host processor at the canonical thing it's designed for, then it's a pretty pointless combination!

Oli Charlesworth 2010-10-20 21:50:49

@Oli: it would certainly be interesting to try a routine such as this on both the ARM and the DSP and see which comes out faster. One advantage of doing it on the DSP of course is that it leaves the ARM free for other tasks. I guess the pragmatic solution though would be to see which, if any, has existing library code for this that you might be able to leverage for this task and then go with that. Most DSPs have 1D and 2D signal processing libraries so that would probably be the first thing to look at.

Paul R 2010-10-20 21:59:00

@Paul: Good point with "doing it on the DSP leaves the ARM free".

Oli Charlesworth 2010-10-20 22:09:27

@Paul: Thanks for the things you said. Do you know where can I find these DSP signal processing libraries you mentioned?

snakile 2010-10-20 22:19:08

@Paul: I can't give an exact answer for the dimensions question. Th algorithms I plan to implement come from an academic paper which doesn't say anything about the image dimentions. But I guess I'll want it to operate on 512X512 images.

snakile 2010-10-20 22:22:23

@snakikle: What's more important is the kernel, i.e. the thing you'll be convolving the image with. As Paul says, if this is "small", then it's more efficient to use standard convolution.

Oli Charlesworth 2010-10-20 22:33:04

@snakile: I believe the DSP on the Beagleboard is from the TI C64X family, so you might want to look at the TI web site or any other TI DSP-related sites to see what ready-rolled routines are available. You'll need a C64X toolchain to build this code of course (see http://elinux.org/BeagleBoard/DSP_Howto), and it's probably going to be a lot harder to test/debug than regular ARM CPU code, but that's half the fun of course.

Paul R 2010-10-20 22:43:11

@Oli, @Paul: Thank you both. Your remarks were helpful. Now it turns out the kernel is small, so as Paul said it doesn't make sense to use FFT. I edited my question. Could you please help me a bit more?

snakile 2010-10-28 11:40:06

@snakile: no problem - I've edited my answer to show how to implement a convolution with a KxK kernel with integer data and coefficients.

Paul R 2010-10-28 12:56:03

ansaurus

tags:

views:

answers:

Fast 2D convolution for DSP

related questions