This is the post I post days before, and I loss the account and registered another one
I am trying to modify the imageDenosing class in CUDA SDK, I need to repeat the filter many time incase to capture the time. But my code doesn't work properly.
//start
__global__ void F1D(TColor *image,int imageW,int imageH, TColor *buffer)
{
const int ix = blockDim.x * blockIdx.x + threadIdx.x;
const int iy = blockDim.y * blockIdx.y + threadIdx.y;
if(iy != 0 && iy < imageH-1 && ix < imageW)
{
float4 fresult = get_color(image[imageW * iy + ix]);
float4 fresult4 = get_color(image[imageW * (iy+1) + ix]);
float4 fresult5 = get_color(image[imageW * (iy-1) + ix]);
float4 fresult7;
fresult7.x = fresult.x*0.5+fresult4.x*.25+fresult5.x*.25;
fresult7.y = fresult.y*0.5+fresult4.y*.25+fresult5.y*.25;
fresult7.z = fresult.z*0.5+fresult4.z*.25+fresult5.z*.25;
buffer[imageW * iy + ix] =
make_color(fresult7.x,fresult7.y,fresult7.z,0);
}
image[imageW * iy + ix] = buffer[imageW * iy + ix];
//should be use cudaMemcpy, But it fails
}
//extern
extern "C" void
cuda_F1D(TColor *dst, int imageW, int imageH)
{
dim3 threads(BLOCKDIM_X, BLOCKDIM_Y);
dim3 grid(iDivUp(imageW, BLOCKDIM_X), iDivUp(imageH, BLOCKDIM_Y));
Copy<<<grid, threads>>>(dst, imageW, imageH);
size_t size = imageW*imageH*sizeof(TColor);
TColor *host =(TColor*) malloc(size);
TColor *dst2;
//TColor *dst3;
//TColor *d = new TColor(imageW*imageH*sizeof(TColor));
dim3 threads2(imageW,1);
dim3 grid2(iDivUp(imageW, imageW), iDivUp(imageH, 1));
for(int i = 0;i<1;i++)
{
cudaMalloc( (void **)&dst2, size);
cudaMemcpy(dst2, dst, imageW*imageH*sizeof(TColor),cudaMemcpyHostToDevice);
F1D<<<grid2, threads2>>>(dst, imageW, imageH,dst2);
cudaFree(dst2);
}
}
This code works, but cant synchronise the array of image. and lead to many synchronise problem.
Here is my task question
Lab Class Description:
In this assignment, you will have to implement a small CUDA program for image processing. You may want to re-use the Image denoising visual studio project available with the CUDA SDK.
Question 1 Write a CUDA kernel that applies the following 1D filtering kernel to the image [0.25 0.5 0.25] (Similar to the previous CW). Measure how much time it takes using a CUDA timer. For that, you may want to repeat the filtering n times so that your measurement is accurate enough.
Thus I finished the filtering kernel part and time measure. But the looping N times fails. The result blurs out and when i becomes larger the image black out.