opencl

How native is OpenCL in Java?

I see there is an OpenCL binding for Java. Does this enable one to truly program in Java, using CPU / GPU etc. as processing cores, or does it merely give Java apps access to C++ OpenCL enabled methods? Out of interest, is there an OpenCL binding for .Net? ...

trouble reading from __global memory after atom_inc in OpenCL

OpenCL doesn't have a global barrier that will stop all threads, so I'm trying to create a work around with the following code: void barrier(__global uint* scratch) { uint nThreads = get_global_size(0); atom_inc(scratch); /* this loop never terminates */ while(scratch[0] < nThreads) { continue; } } The idea is that each ...

Two ways to create a buffer object in opencl: clCreateBuffer vs. clCreateBuffer + clEnqueueWriteBuffer

Hi, I have seen both versions in tutorials, but I could not find out, what their advantages and disadvantages are. Which one is the proper one? cl_mem input = clCreateBuffer(context,CL_MEM_READ_ONLY,sizeof(float) * DATA_SIZE, NULL, NULL); clEnqueueWriteBuffer(command_queue, input, CL_TRUE, 0, sizeof(float) * DATA_SIZE, inputdata, 0, NU...

How to display latency, memory ops, and arithmetic ops in Nvidia Compute Profiler

Hey all, I heard that with the Nvidia compute profiler, it should be possible to get a comparison of how much time is being spent for arithmetic ops, memory ops, or on latency. I searched the profiler after running my program and I tried googling, but I don't see anything related to figuring out this metrics. Can anybody help, is my qu...

What is a bank conflict? (Doing Cuda/OpenCL programming)

I have been reading the programming guide for CUDA and OpenCL, and I cannot figure out what a bank conflict is. They just sort of dive into how to solve the problem without elaborating on the subject itself. I tried googling for bank conflict and bank conflict computer science but I couldn't find much. Can anybody help me understand or p...

Why aren't there bank conflicts in global memory for Cuda/OpenCL?

One thing I haven't figured out and google isn't helping me, is why is it possible to have bank conflicts with shared memory, but not in global memory? Can there be bank conflicts with registers? UPDATE Wow I really appreciate the two answers from Tibbit and Grizzly. It seems that I can only give a green check mark to one answer though....

F# with OpenTK example?

Hi! Is anybody aware of a possibility to use C# libraries like OpenTK (http://www.opentk.com/) from F#, too? I'm especially interested in a Math toolkit library to give some scripts extra speed by taking advantage of the GPU from within F#. What's a painless way to do that? :) ...

Question about Compute Prof's fields for incoherent and coherent gst/gld? (Cuda/OpenCL)

Hey all, I am using Compute Prof 3.2 and a Geforce GTX 280. I have compute capability 1.3 then I believe. This file, http://developer.download.nvidia.com/compute/cuda/3_0/toolkit/docs/visual_profiler_cuda/CUDA_Profiler_3.0.txt, seems to show that I should be able to see these fields since I am using a 1.x compute device. Well I don't s...

Rationalizing what is going on in my simple OpenCL kernel in regards to global memory

const char programSource[] = "__kernel void vecAdd(__global int *a, __global int *b, __global int *c)" "{" " int gid = get_global_id(0);" "for(int i=0; i<10; i++){" " a[gid] = b[gid] + c[gid];}" "}"; The kernel above is a vector addition done ten times per loop. I have used the prog...

Trying to mix in openCL with CUDA in Nvidia's SDK template

Hey all, I have been having a tough time setting up an experiment where I allocate memory with CUDA on the device, take that pointer to memory on the device, use it in OpenCL, and return the results. I want to see if this is possible. I had a tough time getting a CUDA project to work so I just used Nvidia's template project in their SDK...

In OpenCL 1.1 my call to function min() is ambiguous and I can't figure out why

I just upgraded from OpenCL 1.0 to 1.1. When I make my call to the min() function, I get error output: <program source>:45:44: error: call to 'min' is ambiguous int nFramesThisKernelIngests = min(nFramesToIngest - nAvg*nPP*get_global_id(2), nAvg*nPP); <built-in>:3569:27: note: candidate function double16 __OVERLOADABLE...

Question about cl_mem in OpenCL

I have been using cl_mem in some of my OpenCL boilerplate code, but I have been using it through context and not a sharp understanding of what exactly it is. I have been using it as a type for the memory I push on and off the board, which has so far been floats. I tried looking at the OpenCL docs, but cl_mem doesn't show up (does it?). I...

I get CL_SUCCESS for all my OpenCL error codes, but all my clEnqueueReadBuffer calls crash the program

I am badly stuck on why my OpenCL program crashes after clEnqueueReadBuffer calls. Is it a okay if I post all 300 lines of boilerplate/kernel code? Otherwise any suggestions on how to debug OpenCL code besides eyeballing and error code printing would be appreciated. UPDATE Wow, so I merely messed up my paramter to host memory on my clEn...

C structs strange behaviour

Hi, I have some long source code that involves a struct definition: struct exec_env { cl_program* cpPrograms; cl_context cxGPUContext; int cpProgramCount; int cpKernelCount; int nvidia_platform_index; int num_cl_mem_buffs_used; int total; cl_platform_id cpPlatform; cl_uint ciDeviceCount; cl_int...

Size of statically allocated shared memory per block question with Compute Prof (Cuda/OpenCL)

In Nvidia's compute prof there is a column called "static private mem per work group" and the tooltip of it says "Size of statically allocated shared memory per block". My application shows that I am getting 64 (bytes I assume) per block. Does that mean I am using somewhere between 1-64 of those bytes or is the profiler just telling me t...

Questions about global and local work size

Hi everybody, searching the nvidia forums I found these questions, which are also of interest to me, but nobody had answered them in the last four days or so. Can you help? Original forum post: Digging into OpenCl reading tutorials some things stayed unclear for me. Here is a collection of my questions regarding local and global work ...

difference between openTK and cloo?

what is the difference between using openTk and cloo for developing openCL applications? ...

Cloo OpenCL c# Problem

Hello, I am trying to get a simple Cloo program to run but it is not working, can anyone tell me why? using System; using System.Collections.Generic; using System.ComponentModel; using System.Data; using System.Drawing; using System.Linq; using System.Text; using System.Windows.Forms; using Cloo; using System.Runtime.InteropServices; n...

How to mitigate host + device memory tranfer bottlenecks in OpenCL/CUDA

If my algorithm is bottlenecked by host to device and device to host memory transfers, is the only solution a different or revised algorithm? ...

pinned memory opencl, has anybody successfully used it?

I used the CL_MEM_ALLOC_HOST_PTR flag with my clCreateBuffer calls, but the Compute Profiler shows all my "host mem transfer type" as being Pageable. I tried it in two different kernel setups, but the profiler wouldn't show that I was using pinned memory. Is it just really random when a kernel gets to use pinned memory? Is it constraine...