views:

630

answers:

7

I am using XNA to build a project where I can draw "graffiti" on my wall using an LCD projector and a monochrome camera that is filtered to see only hand held laser dot pointers. I want to use any number of laser pointers -- don't really care about differentiating them at this point.

The wall is 10' x 10', and the camera is only 640x480 so I'm attempting to use sub-pixel measurement using a spline curve as outlined here: tpub.com

The camera runs at 120fps (8-bit), so my question to you all is the fastest way to to find that subpixel laser dot center. Currently I'm using a brute force 2D search to find the brightest pixel on the image (0 - 254) before doing the spline interpolation. That method is not very fast and each frame takes longer to computer than they are coming in.

Edit: To clarify, in the end my camera data is represented by a 2D array of bytes indicating pixel brightness.

What I'd like to do is use an XNA shader to crunch the image for me. Is that practical? From what I understand, there really isn't a way to keep persistent variables in a Pixel Shader such as running totals, averages, etc.

But for arguments sake, let's say I found the brightest pixels using brute force, then stored them and their neighboring pixels for the spline curve into X number of vertices using texcoords. Is is practical then to use HLSL to compute a spline curve using texcoords?

I am also open to suggestions outside of my XNA box, be it DX10/DX11, maybe some sort of FPGA, etc. I just don't really have much experience with ways of crunching data in this way. I figure if they can do something like this on a Wii-Mote using 2 AA batteries than I'm probably going about this the wrong way.

Any ideas?

+5  A: 

Hey,

If by Brute-forcing you mean looking at every pixel independently, it is basically the only way of doing it. You will have to scan through all the images pixels, no matter what you want to do with the image. Althought you might not need to find the brightest pixels, you can filter the image by color (ex.: if your using a red laser). This is easily done using a HSV color coded image. If you are looking for some faster algorithms, try OpenCV. It's been optimized again and again for image treatment, and you can use it in C# via a wrapper:

[http://www.codeproject.com/KB/cs/Intel_OpenCV.aspx][1]

OpenCV can also help you easily find the point centers and track each points.

Is there a reason you are using a 120fps camera? you know the human eye can only see about 30fps right? I'm guessing it's to follow very fast laser movements... You might want to consider bringning it down, because real-time processing of 120fps will be very hard to acheive.

David Menard
+4  A: 

running through 640*480 bytes to find the highest byte should run within a ms. Even on slow processors. No need to take the route of shaders.

I would advice to optimize your loop. for instance: this is really slow (because it does a multiplication with every array lookup):

byte highest=0;
foundX=-1, foundY=-1;
for(y=0; y<480; y++)
{
    for(x=0; x<640; x++)
    {
        if(myBytes[x][y] > highest)
        {
            highest = myBytes[x][y];
            foundX = x;
            foundY = y;
        }
    }
}

this is much faster:

byte [] myBytes = new byte[640*480];
//fill it with your image

byte highest=0;
int found=-1, foundX=-1, foundY=-1;
int len = 640*480;
for(i=0; i<len; i++)
{
    if(myBytes[i] > highest)
    {
        highest = myBytes[i];
        found = i;
    }
}
if(found!=-1)
{
    foundX = i%640;
    foundY = i/640;
}

This is off the top of my head so sorry for errors ;^)

Toad
+3  A: 

Brute-force is the only real way, however your idea of using a shader is good - you'd be offloading the brute-force check from the CPU, which can only look at a small number of pixels simultaneously (roughly 1 per core), to the GPU, which likely has 100+ dumb cores (pipelines) that can simultaneously compare pixels (your algorithm may need to be modified a bit to work well with the 1 instruction-many cores arrangement of a GPU).

The biggest issue I see is whether or not you can move that data to the GPU fast enough.

David
How could I track the single highest pixel on a frame in the GPU? Am I understanding this correct in that the dumb cores can't share memory in a way that would let them compare a pixel to the previous brightest pixel?
bufferz
A GPU could be thought of as a large collection of really simple CPUs optimized for math operations. The biggest difference though is that the instruction decoder has been stripped out - a single controller sends the same instruction to each dumb core, where it is executed on that cores piece of data. IF statements are carried out by running the same code twice - instead of branching, cores that don't match the condition just stop. The same data is then sent through again, but with the compare instruction reversed.
David
In your case, you can't compare to the *immediate* previous brightest pixel, because any of the (lets say) 128 cores might have found a brighter pixel. Instead, try having each core concentrate on just it's own line - find the brightest pixel in the line (processing 128 lines simultaneously). Once the lines in the image have effectively been summarized as just the brightest pixel in that line, you can find the overall brightest in a serial manner.
David
+1  A: 

Put the camera slightly out of focus and bitblt against a neutral sample. You can quickly scan rows for non 0 values. Also if you are at 8 bits and pick up 4 bytes at a time you can process the image faster. As other pointed out you might reduce the frame rate. If you have less fidelity than the resulting image there isn't much point in the high scan rate.

(The slight out of focus camera will help get just the brightest points and reduce false positives if you have a busy surface... of course assuming you are not shooting a smooth/flat surface)

Matthew Whited
+2  A: 

Another optimization to consider: if you're drawing, then the current location of the pointer is probably close the last location of the pointer. Remember the last recorded position of the pointer between frames, and only scan a region close to that position... say a 1'x1' area. Only if the pointer isn't found in that area should you scan the whole surface.

Obviously, there will be a tradeoff between how quickly your program can scan, and how quickly you'll be able to move your mouse before the camera "loses" the pointer and has to go to the slow, full-image scan. A little experimentation will probably reveal the optimum value.

Cool project, by the way.

JSBangs
Why just 1x1. Why not increment the size 1^2. So if not found in a 1x1 it will then check 2x2, 4x4 etc. A little care and you could stride across the area you've already checked.
graham.reeds
I thought about that, too. But he can have multiple lasers so he's going to have to look at every pixel anyway. A laser can turn on or reenter the area without notice.
Nosredna
+1  A: 

Start with a black output buffer. Forget about subpixel for now. Every frame, every pixel, do this:

outbuff=max(outbuff,inbuff);

Do subpixel filtering to a third "clean" buffer when you're done with the image. Or do a chunk or a line of the screen at a time in real time. Advantage: real-time "rough" view of the drawing, cleaned up as you go.

When you convert from the rough output buffer to the "clean" third buffer, you can clear the rough to black. This lets you keep drawing forever without slowing down.

By drawing the "clean" over top the "rough," maybe in a slightly different color, you'll have the best of both worlds.

This is similar to what paint programs do--if you draw really fast, you see a rough version, then the paint program "cleans up" the image when it has time.


Some comments on the algorithm:

I've seen a lot of cheats in this arena. I've played Sonic on a Sega Genesis emulator that upsamples. and it has some pretty wild algorithms that work very well and are very fast.

You actually have some advantages you can gain because you might know the brightness and the radius on the dot.

You might just look at each pixel and its 8 neighbors and let those 9 pixels "vote" according to their brightness for where the subpixel lies.


Other thoughts

Your hand is not that accurate when you control a laser pointer. Try getting all the dots every 10 frames or so, identifying which beams are which (based on previous motion, and accounting for new dots, turned-off lasers, and dots that have entered or left the visual field), then just drawing a high resolution curve. Don't worry about sub pixel in the input--just draw the curve into the high res output.

Use a Catmull-Rom spline, which goes through all control points.

Nosredna
I like this idea, makes perfect sense. So maybe represent the "clean" pixels as structs containing 5x5 pixel values and go through them in a queue on a separate thread.Still though, burning through this queue would be a massive CPU undertaking. Maybe I can store each one of these structs in a vertex and do the spline curve on the video card? I'm still looking for a lighting fast way to compute 5x5 2D splines.Thank you for this high level suggestion!
bufferz
It really doesn't matter how slow it is. You can optimize later. You just watch the timer and see how much of the screen you can do each frame.
Nosredna
+1  A: 

You're dealing with some pretty complex maths if you want sub-pixel accuracy. I think this paper is something to consider. Unfortunately, you'll have to pay to see it using that site. If you've got access to a suitable library, they may be able to get hold of it for you.

The link in the original post suggested doing 1000 spline calculations for each axis - it treated x and y independantly, which is OK for circular images but is a bit off if the image is a skewed ellipse. You could use the following to get a reasonable estimate:

xc = sum (xn.f(xn)) / sum (f(xn))

where xc is the mean, xn is the a point along the x-axis and f(xn) is the value at the point xn. So for this:

          *
       *  *
       *  *
       *  *
       *  *
       *  *
       *  *  *
    *  *  *  *
    *  *  *  *
 *  *  *  *  *  *
------------------
 2  3  4  5  6  7

gives:

sum (xn.f(xn)) = 1 * 2 + 3 * 3 + 4 * 9 + 5 * 10 + 6 * 4 + 7 * 1

sum (f(xn)) = 1 + 3 + 9 + 10 + 4 + 1

xc = 128 / 28 = 4.57

and repeat for the y-axis.

Skizz
This works quite well. This runs much faster for me than my spline implementation too. It's pretty much real time! Thank you for the suggestion.
bufferz