I'm allocating a cl_mem buffer on a GPU and work on it, which works fine until a certain size is exceeded. In that case the allocation itself succeeds, but execution or copying does not. I do want to use the device's memory for faster operation so I allocate like:
buf = clCreateBuffer (cxGPUContext, CL_MEM_WRITE_ONLY, buf_size, NULL, &c...
I wrote a simply OpenCL program based off the SDK and it compiles and runs, however the output is wrong. Is there something I'm doing wrong?
Any suggestions for learning to debug C and OpenCL is much appreciated. I'm quite new to the platform.
Code is below.
The output in array c is all zeros.
Thanks.
test_opencl.h
#ifndef _TEST_...
I'm thinking in particular of processing primitives, things like FFT, convolution, correlation, matrix mathematics, any kind of machine vision primitives. I haven't been
able to find anything along these lines, does anyone know of any good projects that have sprung up?
...
There has been a significant shift towards data-parallel programming via systems like OpenCL and CUDA over the last few years, and yet books published even within the last six months never even mention the topic of data-parallel programming.
It's not suitable for every problem, but it seems that there is a significant gap here that isn'...
Mac OS X 10.6 comes with OpenCL, but how many applications could have better performances if they would be rewritten to use OpenCL? What kind of applications should be rewritten to use OpenCL?
...
GPGPU is the principle of using the parallel processors on video cards for massive increases in performance.
Does anyone have any ideas about using GPGPU in Delphi, using either OpenCL or CUDA? CUDA was/is NVidia only, but they have also adopted the OpenCL "standard".
I found a few Delphi samples from Google searches but they either c...
I'm fairly new to OpenCL so please bear with me.
In the first iteration of my code, I used basic memory buffers for large datasets and declared them global. However now that I'm looking to improve the timing, I wanted to use texture memory for this. In the CUDA version, we use cudaBindTexture and tex1Dfetch to obtain the data for a larg...
My OpenCL program can find the GPU device when I am logged in at the console, but not when I am logged in remotely with ssh. Further, if I run the program as root in the ssh session, the program can find the GPU.
The computer is a Snow Leopard Mac with a GeForce 9400 GPU.
If I run the program (see below) from the console or as root, t...
I'm currently working on a project suing OpenCL on a NVIDIA Tesla C1060 (driver version 195.17). However I'm getting some strange behaviour I can't really explain. Here is the code which puzzles me (reduced for clarity and testing purpose):
kernel void TestKernel(global const int* groupOffsets, global float* result,
...
I recently started to learn how to use openCL to speed up some part of my code. So far the speed gain is impressive. In one case the code ran up to 50X faster than on the CPU. However I wonder if can start using this code in a production environnement. The reason is that the first time that I tried to run the example code, nothing worked...
Hi,
I am working on OpenCL. Does anyone know of a good debugger for OpenCL so that I can step into the OpenCL code and trace?
Thanks,
Rakesh.
...
Should i learn OpenCL if i only want to program NVIDIA GPUs ?
...
If I have something like:
err = clEnqueueReadBuffer(cmdQueue, output, CL_TRUE, 0, sizeof(float) * data_sz, &results, 0, NULL, NULL);
I'd like to do:
if (err != CL_SUCCESS){
perror("Read Failed!");
}
But the error constants like "CL_HOST_OUT_OF_MEMORY" and the like are (understandably) not known to perror().
I could go around g...
Is it possible to use custom types in OpenCL kernel like gmp types (mpz_t, mpq_t, …) ?
To have something like this (this kernel doesn't build just because of #include <gmp.h>) :
#include <gmp.h>
__kernel square(
__global mpz_t* input,
__global mpz_t number,
__global int* output,
const unsigned int count)
{
int i = get_g...
Hi.
I'm just starting out learning OpenCL. I'm trying to get a feel for what performance gains to expect when moving functions/algorithms to the GPU.
The most basic kernel given in most tutorials is a kernel that takes two arrays of numbers and sums the value at the corresponding indexes and adds them to a third array, like so:
__ker...
Hi,
I wanted to know if there is any limit on the number of arguments that are set to kernel function in OpenCL. I am getting the error as INVALID_ARG_INDEX while setting arguments. I am setting 9 arguments in the kernel function. Please help me in this regard.
Thanks,
Rakesh.
...
I've been playing with OpenCL recently, and I'm able to write simple kernels that use only global memory. Now I'd like to start using local memory, but I can't seem to figure out how to use get_local_size() and get_local_id() to compute one "chunk" of output at a time.
For example, let's say I wanted to convert Apple's OpenCL Hello Worl...
I am trying to parallel a classic map-reduce problem (which can parallel well with MPI) with OpenCL, namely, the AMD implementation. But the result bothers me.
Let me brief about the problem first. There are two type of data that flow into the system: the feature set (30 parameters for each) and the sample set (9000+ dimensions for each...
Hi, what's the basic setup for Linux to compilie a C/C++ examples from OpenCL SDK?
...
Hi folks,
As I was finishing coding my project for a multicore programming class I came up upon something really weird I wanted to discuss with you.
We were asked to create any program that would show significant improvement in being programmed for a multi-core platform. I’ve decided to try and code something on the GPU to try out Ope...