I have an application where I need take the average intensity of an image for around 1 million images. It "feels" like a job for a GPU fragment shader, but fragment shaders are for per-pixel local computations, while image averaging is a global operation.
An image sum will suffice, since it only differs from the average by a constant. Is there a way to tell a fragment shader to add the current pixel value to some global accumulator variable, which I can retrieve by the CPU at the end of the shader program? It seems like adding to an accumulator should be safe in parallel since addition is commutative, as long as the addition is atomic.
One approach I considered is loading the image into a texture, applying a 2x2 box-blur, load the result back into a N/2 x N/2 texture and repeating until the output is 1x1. However, this would take log n applications of the shader; plus lots of copy operations to move memory from the framebuffer into a texture.
Is there a way to do it in one pass? Or are there other shader tricks to do this that I haven't thought of? Or should I just break down and use CUDA? Not sure if it helps, but my images are sparse (90%+ of entries are zero).