views:

46

answers:

2

It seems like 2 million floats should be no big deal, only 8MBs of 1GB of GPU RAM. I am able to allocate that much at times and sometimes more than that with no trouble. I get CL_OUT_OF_RESOURCES when I do a clEnqueueReadBuffer, which seems odd. Am I able to sniff out where the trouble really started? OpenCL shouldn't be failing like this at clEnqueueReadBuffer right? It should be when I allocated the data right? Is there some way to get more details than just the error code? It would be cool if I could see how much VRAM was allocated when OpenCL declared CL_OUT_OF_RESOURCES.

+1  A: 

Not all available memory can necessarily be supplied to a single acquisition request. Read up on heap fragmentation 1, 2, 3 to learn more about why the largest allocation that can succeed is for the largest contiguous block of memory and how blocks get divided up into smaller pieces as a result of using the memory.

It's not that the resource is exhausted... It just can't find a single piece big enough to satisfy your request...

Eric Towers
This makes sense, thanks for pointing it out. Is there a way to analyze what the fragmentation of heap memory looks like on the GPU when the failure occurred?
Maybe gDEBbugger? http://www.gremedy.com/ I've never used it.
Eric Towers
Somehow I doubt thats really the problem, since the gpu memory should generally not be fragmented enough for that. Afterall 8MB isn't really that much on a 1GB card (exspecially since the driver should be able to pull currently unused memory to mainmemory) and allocations of gpu memory are typically relatively chunky. So it seems likely that you must be close to the memory limit if you see such issues due to fragmentation anyhow and if it's a normal grapic card based system (opposed to e.g. tesla) I doubt the driver would not intervene at that point (by killing some contexts).
Grizzly
I still cannot figure out how to pin down this problem. I get the CL_OUT_OF_RESOURCES exception even if I put in clFinish() calls between my OpenCL calls. Why does it get triggered when I do a clEnqueueReadBuffer()?
@user464095: Random guess: Are you specifying a local_work_size to clEnqueueNDRangeKernel()?
Eric Towers
@user464095: clEnqueueReadBuffer() has only two error modes, CL_MEM_OBJECT_ALLOCATION_FAILURE and CL_OUT_OF_HOST_MEMORY, that could map to a CL_OUT_OF_RESOURCES. Do you get the exception before clEnqueueReadBuffer() returns? (If so, this exception is probably due to a prior call.)
Eric Towers
@user464095: Actually, it's rather likely that the problem is a call prior to clEnqueueReadBuffer(). That's usually the first time that a failure at the GPU actually percolates out to your code.
Eric Towers
A local_work_size of {256,1,1} is specified in clEnqueueNDRangeKernel. I think I am just going to have to probe my code with clEnqueueReadBuffer() calls in order to try and isolate where the failure actually occurs. It feels like a indexing problem, since it only happens when certain paremters are really big. I just can't prove it yet. It could also be some weird OpenCL quirk.
A: 

From another source:

- calling clFinish() gets you the error status for the calculation (rather than getting it when you try to read data).
- the "out of resources" error can also be caused by a 5s timeout if the (NVidia) card is also being used as a display
- it can also appear when you have pointer errors in your kernel.

A follow-up suggests running the kernel first on the CPU to ensure you're not making out-of-bounds memory accesses.

Eric Towers