views:

788

answers:

2

I'm fairly new to OpenCL so please bear with me.

In the first iteration of my code, I used basic memory buffers for large datasets and declared them global. However now that I'm looking to improve the timing, I wanted to use texture memory for this. In the CUDA version, we use cudaBindTexture and tex1Dfetch to obtain the data for a large 1D float array. From my understanding of the specification, texture memory is the same thing as image memory. However, since there are only 2D and 3D image objects with max heights and widths, I run into some issues. My array larger than max height/width, but not max height * max width. Must I convert my 1D array into 2D? Or is there a better way to do it?

Or am I completely off?

I did read http://forums.nvidia.com/index.php?showtopic=151743 and http://forums.nvidia.com/index.php?showtopic=150454 but they weren't exactly conclusive in whether the texture memory referred to in Best Practices and Programming Guide was in fact image objects.

Thanks and any help/suggestions are greatly welcome!

+1  A: 

My array larger than max height/width, but not max height * max width. Must I convert my 1D array into 2D?

Yes, the texture hardware has constraints on the maximum index values. If you exceed these values, you'll need to convert to using multiple index values.

That said, I'm not implying that converting to texture access is going to speedup your program.

goger
Over using global memory, read_only image texture memory will generally provide better performance based on the kernel because it is cached. I got a pretty nice performance boost from it.
achinda99
Any thoughts on this question? http://forums.nvidia.com/index.php?showtopic=154686
achinda99
+2  A: 

I found the best answer as a reply to my post on NVidia's forum here.

achinda99
Yes, that explained it well.
goger