ansaurus

Question

How to read successfully from a 2D texture

Answer 1

A:

Graphics cards usually expect textures to have dimensions that are powers of 2, this is especially true for nVidia cards. Cuda's cudaMallocPitch and cudaMemcpy2D work with these pitches and looking at your code, the safest solution is to adjust the width and height yourself to be on the safe side. Otherwise, Cuda might write to an invalid memory because it would be expecting wrong offsets:

#define height 16
#define width 11

...

size_t roundUpToPowerOf2(size_t v)
{
  // See http://graphics.stanford.edu/~seander/bithacks.html#RoundUpPowerOf2
  --v;
  v |= v >> 1;
  v |= v >> 2;
  v |= v >> 4;
  v |= v >> 8;
  v |= v >> 16;
  ++v;
  return v;
}
...

size_t horizontal_pitch = roundUpToPowerOf2(width);
size_t vertical_pitch = roundUpToPowerOf2(height);
size_t memsize = horizontal_pitch * vertical_pitch;

...

// Read data from host to device
cudaMemcpy2D((void*)devMPPtr,pitch,(void*)data,sizeof(float)*horizontal_pitch,
  sizeof(float)*width,height,cudaMemcpyHostToDevice);

//Read back and check this memory
cudaMemcpy2D((void*)h_out,horizontal_pitch*sizeof(float),(void*)devMPPtr,pitch,
  sizeof(float)*width,height,cudaMemcpyDeviceToHost);

// Print the memory
 for (int i=0; i<height; i++){
  for (int j=0; j<width; j++){
   printf("%2.2f ",h_out[i*horizontal_pitch+j]);
  }
 cout << endl;
 }

...

// Copy back data to host
cudaMemcpy((void*)h_out,(void*)devMPtr,horizontal_pitch*vertical_pitch*sizeof(float),cudaMemcpyDeviceToHost);

// Print the Result
 cout << endl;
 for (int i=0; i<height; i++){
  for (int j=0; j<width; j++){
   printf("%2.2f ",h_out[i*horizontal_pitch+j]);
  }
 cout << endl;
 }
 cout << "Done" << endl;

Hopefully I haven't overlooked any place where horizontal_pitch/vertical_pitch should be used instead of plain width/height.

dark_charlie 2010-10-01 11:30:20

I just tried this and I am still getting incorrect results - with this small array it doesn't output much. Can someone please tell me how to get this working? Basically the first output is 0 1 2... N where N=(width-1). The second output should be 1 2 3 ... N+1

Marm0t 2010-10-01 14:33:04

Answer 2

A:

It might have do with your blocksize. In this code you are trying to have a block of 16x16 threads write to a 11x16 memory block. That means that some of your threads are writing to unallocated memory. That also explains why your tests of (16*M by 32*N) worked: there were no threads writing to unallocated memory, since your dimensions were a multiple of 16.

An easy way to fix this problem is something like this:

if ((x < width) && (y < height)) {
   // write output 
  devMPtr[idy*width+idx]= tex2D(texRefEx,u,v); 
}

You'll need to either pass the height and width to the kernel function or copy a constant to the card before you call the kernel.

tkerwin 2010-10-27 17:31:26

From the programming guide cudamalloc pitch does padding (I'm guessing with zeros, they don't explicitly state that: "width rounded up to the closest multiple of this [pitch] size and its rows padded accordingly." So when the texture reference access memory not in the defined region, it should be accessing zeros (the action is defined). You can test this by writing 2D memory to 2D memory (without textures) - it works fine. If you read back a region that represents the padded 2D array defined by cmp, you see zeros in the appropriate place - thanks for your response much appreciated.

Marm0t 2010-10-27 17:58:14

Answer 3

A:

 // Texutre Coordinates
 float u=(idx + 0.5)/float(width);
 float v=(idy + 0.5)/float(height);

You need an offset to get to the center of the texel. I think there might have been some rounding error for your non-multiple of 16 textures. I tried this and it worked for me (both outputs were identical).

tkerwin 2010-10-28 15:11:22

I think I've done this before - but it shouldn't matter. I used 'texRefEx.filterMode= cudaFilterModePoint' so it filters to a single value. - I will try again as a sanity check : )

Marm0t 2010-10-28 16:33:56

Point sampling wouldn't fix this problem, since it's actually falling just outside the edge of the texel. It only seems to work on wrap mode and not clamp though.

tkerwin 2010-10-28 17:36:54

well that's good, I specifically was interested in the wrap mode (this whole problem I was encountering was just a curiosity/road block). I'll let you know how it goes - If this works I will be 95% happy (If it works it means I need to re-implement things in textures after having a shared memory solution...)

Marm0t 2010-10-28 18:56:11

ansaurus

tags:

views:

answers:

How to read successfully from a 2D texture

related questions