I see there is an OpenCL binding for Java. Does this enable one to truly program in Java, using CPU / GPU etc. as processing cores, or does it merely give Java apps access to C++ OpenCL enabled methods?
Out of interest, is there an OpenCL binding for .Net?
...
OpenCL doesn't have a global barrier that will stop all threads, so I'm trying to create a work around with the following code:
void barrier(__global uint* scratch) {
uint nThreads = get_global_size(0);
atom_inc(scratch);
/* this loop never terminates */
while(scratch[0] < nThreads) {
continue;
}
}
The idea is that each ...
Hi,
I have seen both versions in tutorials, but I could not find out, what their advantages and disadvantages are. Which one is the proper one?
cl_mem input = clCreateBuffer(context,CL_MEM_READ_ONLY,sizeof(float) * DATA_SIZE, NULL, NULL);
clEnqueueWriteBuffer(command_queue, input, CL_TRUE, 0, sizeof(float) * DATA_SIZE, inputdata, 0, NU...
Hey all,
I heard that with the Nvidia compute profiler, it should be possible to get a comparison of how much time is being spent for arithmetic ops, memory ops, or on latency. I searched the profiler after running my program and I tried googling, but I don't see anything related to figuring out this metrics.
Can anybody help, is my qu...
I have been reading the programming guide for CUDA and OpenCL, and I cannot figure out what a bank conflict is. They just sort of dive into how to solve the problem without elaborating on the subject itself. I tried googling for bank conflict and bank conflict computer science but I couldn't find much. Can anybody help me understand or p...
One thing I haven't figured out and google isn't helping me, is why is it possible to have bank conflicts with shared memory, but not in global memory? Can there be bank conflicts with registers?
UPDATE
Wow I really appreciate the two answers from Tibbit and Grizzly. It seems that I can only give a green check mark to one answer though....
Hi!
Is anybody aware of a possibility to use C# libraries like OpenTK (http://www.opentk.com/) from F#, too?
I'm especially interested in a Math toolkit library to give some scripts extra speed by taking advantage of the GPU from within F#.
What's a painless way to do that? :)
...
Hey all,
I am using Compute Prof 3.2 and a Geforce GTX 280. I have compute capability 1.3 then I believe.
This file, http://developer.download.nvidia.com/compute/cuda/3_0/toolkit/docs/visual_profiler_cuda/CUDA_Profiler_3.0.txt, seems to show that I should be able to see these fields since I am using a 1.x compute device. Well I don't s...
const char programSource[] =
"__kernel void vecAdd(__global int *a, __global int *b, __global int *c)"
"{"
" int gid = get_global_id(0);"
"for(int i=0; i<10; i++){"
" a[gid] = b[gid] + c[gid];}"
"}";
The kernel above is a vector addition done ten times per loop. I have used the prog...
Hey all,
I have been having a tough time setting up an experiment where I allocate memory with CUDA on the device, take that pointer to memory on the device, use it in OpenCL, and return the results. I want to see if this is possible. I had a tough time getting a CUDA project to work so I just used Nvidia's template project in their SDK...
I just upgraded from OpenCL 1.0 to 1.1. When I make my call to the min() function, I get error output:
<program source>:45:44: error: call to 'min' is ambiguous
int nFramesThisKernelIngests = min(nFramesToIngest - nAvg*nPP*get_global_id(2), nAvg*nPP);
<built-in>:3569:27: note: candidate function
double16 __OVERLOADABLE...
I have been using cl_mem in some of my OpenCL boilerplate code, but I have been using it through context and not a sharp understanding of what exactly it is. I have been using it as a type for the memory I push on and off the board, which has so far been floats. I tried looking at the OpenCL docs, but cl_mem doesn't show up (does it?). I...
I am badly stuck on why my OpenCL program crashes after clEnqueueReadBuffer calls. Is it a okay if I post all 300 lines of boilerplate/kernel code? Otherwise any suggestions on how to debug OpenCL code besides eyeballing and error code printing would be appreciated.
UPDATE
Wow, so I merely messed up my paramter to host memory on my clEn...
Hi,
I have some long source code that involves a struct definition:
struct exec_env {
cl_program* cpPrograms;
cl_context cxGPUContext;
int cpProgramCount;
int cpKernelCount;
int nvidia_platform_index;
int num_cl_mem_buffs_used;
int total;
cl_platform_id cpPlatform;
cl_uint ciDeviceCount;
cl_int...
In Nvidia's compute prof there is a column called "static private mem per work group" and the tooltip of it says "Size of statically allocated shared memory per block". My application shows that I am getting 64 (bytes I assume) per block. Does that mean I am using somewhere between 1-64 of those bytes or is the profiler just telling me t...
Hi everybody,
searching the nvidia forums I found these questions, which are also of interest to me, but nobody had answered them in the last four days or so. Can you help?
Original forum post:
Digging into OpenCl reading tutorials some things stayed unclear for me. Here is a collection of my questions regarding local and global work ...
what is the difference between using openTk and cloo for developing openCL applications?
...
Hello, I am trying to get a simple Cloo program to run but it is not working, can anyone tell me why?
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Windows.Forms;
using Cloo;
using System.Runtime.InteropServices;
n...
If my algorithm is bottlenecked by host to device and device to host memory transfers, is the only solution a different or revised algorithm?
...
I used the CL_MEM_ALLOC_HOST_PTR flag with my clCreateBuffer calls, but the Compute Profiler shows all my "host mem transfer type" as being Pageable. I tried it in two different kernel setups, but the profiler wouldn't show that I was using pinned memory.
Is it just really random when a kernel gets to use pinned memory? Is it constraine...