ansaurus

Question

CUDA: Wrapping device memory allocation in C++

Answer 1

+2 A:

You seem to have charged in telling us what you are planning to do with explain your use cases for the data (this is probably because it is obvious to you).

You mean “without”? Right, sorry. CUDA is a GPGPU programming language from NVIDIA, built on top of C/C++ by providing a frontend for GCC. My question is primarily directed at people who already know the ins and outs of it. My usage is pretty arbitrary, the question is really more concerned with CUDA because CUDA only offers you a C interface and thus forces you to forego a lot of useful C++ features even if you work in C++ anyway.

The only C API I see so far are cudaMalloc and cudaFree. … Can't you just wrap these inside the constructor/destructor of your CudoClass.

Yes … and no. That's more or less what I'm doing at the moment but I'm not satisfied with it. My question is actually threefold (I'll update the question accordingly):

Is my placement new overload semantically correct? Does it leak memory?
Does anybody have information about future CUDA developments that go in this general direction (let's face it: C interfaces in C++ s*ck)?
How can I take this further in a consistent manner (there are other APIs to consider, e.g. there's not only device memory but also a constant memory store and texture memory)?

Apart from the Malloc and Free what other API's are there? I am presuming they allocate memory and copy data from the device into the newly allocated memory?

Yes … in fact, I've just thought of a way of encapsulating the cudaMemcpy functionality as well. ;-)

Do you just want to see the raw data as arrays of some specific type? Or are the other operations you want to perform?

Actually, once the memory is initialized and some data copied into it (see cudaMemcpy above), I'm pretty much done. The rest of the action takes part of the GPU where I only need some basic array accesses. The very basic workflow here is:

Allocate device memory,
Copy your data to device memory,
Invoke the (parallel) GPU action that processes the memory,
Copy data back to RAM.

Step 3 is pretty much set in stone.

Konrad Rudolph 2008-11-18 21:19:59

Answer 2

+4 A:

I would go with the placement new approach. Then I would define a class that conforms to the std::allocator<> interface. In theory, you could pass this class as a template parameter into std::vector<> and std::map<> and so forth.

Beware, I have heard that doing such things is fraught with difficulty, but at least you will learn a lot more about the STL this way. And you do not need to re-invent your containers and algorithms.

coryan 2008-11-19 01:26:17

Hadn't thought about an allocator. I've actually done this before so it shouldn't be too difficult.

Konrad Rudolph 2008-11-19 08:06:33

Answer 3

+1 A:

There are already two projects who attempt something similar:

<stdcuda>, and
CUDPP

In the meantime, however, I've implemented my allocator and it works really well and was completely straigtforward (> 95% boilerplate code).

Konrad Rudolph 2008-11-19 17:55:56

Answer 4

+1 A:

In the meantime there were some further developments (not so much in terms of the CUDA API, but at least in terms of projects attempting an STL-like approach to CUDA data management).

Most notably there is a project from NVIDIA research: thrust

kynan 2010-07-22 06:51:47

ansaurus

tags:

views:

answers:

CUDA: Wrapping device memory allocation in C++

related questions