opencl

Why do I get a CL_MEM_OBJECT_ALLOCATION_FAILURE?

I'm allocating a cl_mem buffer on a GPU and work on it, which works fine until a certain size is exceeded. In that case the allocation itself succeeds, but execution or copying does not. I do want to use the device's memory for faster operation so I allocate like: buf = clCreateBuffer (cxGPUContext, CL_MEM_WRITE_ONLY, buf_size, NULL, &c...

Simple OpenCL program compiles and runs but output is incorrect

I wrote a simply OpenCL program based off the SDK and it compiles and runs, however the output is wrong. Is there something I'm doing wrong? Any suggestions for learning to debug C and OpenCL is much appreciated. I'm quite new to the platform. Code is below. The output in array c is all zeros. Thanks. test_opencl.h #ifndef _TEST_...

Are there any good 3rd party libraries build on top of openCL yet?

I'm thinking in particular of processing primitives, things like FFT, convolution, correlation, matrix mathematics, any kind of machine vision primitives. I haven't been able to find anything along these lines, does anyone know of any good projects that have sprung up? ...

Why do books on concurrent programming always ignore data parallelism?

There has been a significant shift towards data-parallel programming via systems like OpenCL and CUDA over the last few years, and yet books published even within the last six months never even mention the topic of data-parallel programming. It's not suitable for every problem, but it seems that there is a significant gap here that isn'...

What kind of applications should be rewritten to use OpenCL?

Mac OS X 10.6 comes with OpenCL, but how many applications could have better performances if they would be rewritten to use OpenCL? What kind of applications should be rewritten to use OpenCL? ...

Using Delphi to take advantage of GPGPU technology?

GPGPU is the principle of using the parallel processors on video cards for massive increases in performance. Does anyone have any ideas about using GPGPU in Delphi, using either OpenCL or CUDA? CUDA was/is NVidia only, but they have also adopted the OpenCL "standard". I found a few Delphi samples from Google searches but they either c...

OpenCL Texture Memory

I'm fairly new to OpenCL so please bear with me. In the first iteration of my code, I used basic memory buffers for large datasets and declared them global. However now that I'm looking to improve the timing, I wanted to use texture memory for this. In the CUDA version, we use cudaBindTexture and tex1Dfetch to obtain the data for a larg...

How do I test OpenCL on GPU when logged in remotely on Mac?

My OpenCL program can find the GPU device when I am logged in at the console, but not when I am logged in remotely with ssh. Further, if I run the program as root in the ssh session, the program can find the GPU. The computer is a Snow Leopard Mac with a GeForce 9400 GPU. If I run the program (see below) from the console or as root, t...

Strange behaviour using local memory in OpenCL

I'm currently working on a project suing OpenCL on a NVIDIA Tesla C1060 (driver version 195.17). However I'm getting some strange behaviour I can't really explain. Here is the code which puzzles me (reduced for clarity and testing purpose): kernel void TestKernel(global const int* groupOffsets, global float* result, ...

Can I use openCL in a application that I distribute to non developper machine?

I recently started to learn how to use openCL to speed up some part of my code. So far the speed gain is impressive. In one case the code ran up to 50X faster than on the CPU. However I wonder if can start using this code in a production environnement. The reason is that the first time that I tried to run the example code, nothing worked...

Debugger for OpenCL

Hi, I am working on OpenCL. Does anyone know of a good debugger for OpenCL so that I can step into the OpenCL code and trace? Thanks, Rakesh. ...

OpenCL and CUDA

Should i learn OpenCL if i only want to program NVIDIA GPUs ? ...

What's the perror() equivalent for error codes in OpenCL?

If I have something like: err = clEnqueueReadBuffer(cmdQueue, output, CL_TRUE, 0, sizeof(float) * data_sz, &results, 0, NULL, NULL); I'd like to do: if (err != CL_SUCCESS){ perror("Read Failed!"); } But the error constants like "CL_HOST_OUT_OF_MEMORY" and the like are (understandably) not known to perror(). I could go around g...

Custom types in OpenCL kernel

Is it possible to use custom types in OpenCL kernel like gmp types (mpz_t, mpq_t, …) ? To have something like this (this kernel doesn't build just because of #include <gmp.h>) : #include <gmp.h> __kernel square( __global mpz_t* input, __global mpz_t number, __global int* output, const unsigned int count) { int i = get_g...

What's the most trivial function that would benfit from being computed on a GPU?

Hi. I'm just starting out learning OpenCL. I'm trying to get a feel for what performance gains to expect when moving functions/algorithms to the GPU. The most basic kernel given in most tutorials is a kernel that takes two arrays of numbers and sums the value at the corresponding indexes and adds them to a third array, like so: __ker...

Limit on number of kernel arguments in OpenCL

Hi, I wanted to know if there is any limit on the number of arguments that are set to kernel function in OpenCL. I am getting the error as INVALID_ARG_INDEX while setting arguments. I am setting 9 arguments in the kernel function. Please help me in this regard. Thanks, Rakesh. ...

How do I use local memory in OpenCL?

I've been playing with OpenCL recently, and I'm able to write simple kernels that use only global memory. Now I'd like to start using local memory, but I can't seem to figure out how to use get_local_size() and get_local_id() to compute one "chunk" of output at a time. For example, let's say I wanted to convert Apple's OpenCL Hello Worl...

solve a classic map-reduce problem with opencl?

I am trying to parallel a classic map-reduce problem (which can parallel well with MPI) with OpenCL, namely, the AMD implementation. But the result bothers me. Let me brief about the problem first. There are two type of data that flow into the system: the feature set (30 parameters for each) and the sample set (9000+ dimensions for each...

Linux QT OpenCL basic setup

Hi, what's the basic setup for Linux to compilie a C/C++ examples from OpenCL SDK? ...

My OpenCL kernel is slower on faster hardware.. But why?

Hi folks, As I was finishing coding my project for a multicore programming class I came up upon something really weird I wanted to discuss with you. We were asked to create any program that would show significant improvement in being programmed for a multi-core platform. I’ve decided to try and code something on the GPU to try out Ope...