views:

243

answers:

2

Hi, Can someone please explain the difference in texture memory as used in the context of Cuda as opposed to texture memory used in the context of DirectX. Suppose a graphics card has 512 MB of advertised memory, how is it divided into constant memory/texture memory and global memory.

E.g. I have a tesla card that has totalConstMem as 64KB and totalGlobalMem as 4GB, as queried by cudaGetDeviceProperties, but there is no variable that tells me how much of texture memory is required.

Also, how much is "Texture memory" when accessed via DirectX etc graphics APIs. I don't have experience programming in these APIs, so I don't know how and what kind of memory can they access. But AFAIK, all the memory is access is hardware-cached. Please correct me if I'm wrong.

After KoppeKTop's answer: So does the shared memory act as automatic cache for texture memory in case of CUDA and DirectX both? I don't suppose having another h/w cache would make sense anyway. Does it also mean that if I'm using the whole of shared memory in a kernel, texture memory wouldn't get cached?

Thanks.

+2  A: 

Actually, I had never deal with DirectX, but I could explain the situation with CUDA textures. Texture is simple array (cudaArray or pitched array) with cached read-only access, stored in global memory. So, maximum size of one big texture on 512 MB card is 512 Megs (actually a little bit less, but it's not sufficient). It's optimized to accessing data in 2D space (it's cached as 2D slices). Also coordinates and values could be transformed on access (see CUDA Programming Guide for details).

And no, not all memory is cached on access (for CUDA devices with compute capability 1.x). Only constant and texture memory. Devices with compute capability >= 2.0 (Fermi) caches all memory accesses using L1 and L2 caches (or only L2 - it's configurable).

KoppeKTop
+2  A: 

After KoppeKTop's answer: So does the shared memory act as automatic cache for texture memory in case of CUDA and DirectX both? I don't suppose having another h/w cache would make sense anyway. Does it also mean that if I'm using the whole of shared memory in a kernel, texture memory wouldn't get cached?

For pre-GF100 generation (G80), GPU have dedicated global constant and global texture caches (both are read-only). Shared-memory have their own dedicated memory banks.

For GF100 generation, you still have dedicated texture cache, but the same on-chip memory is now shared between shared-memory and L1 cache (caching global memory). You can configurate how this memory is splitted if you use CUDA. For DirectX/OpenGL the graphics driver uses a 48KB shared memory/16KB L1 cache configuration.

In any case shared-memory is always software-managed (unless the part dedicated to L1 cache on GF100), and don't eat up on texture caches.

Stringer Bell