opencl

The right way to setup VisualStudio 2010 for OpenCL

Hi everyone, what is the right way to setup VisualStuio 2010 for working with *.cl files? I have added *.cl under Tool/Text editor/File extensions and copied usertype.dat into the common7/ide folder, but VS underlines keywords like float4 or cross. Is it necessary to add some key in registry or can somebody propose a tutorial? Thanks ...

Problem with OpenCL

Hi, I'm facing a problem with OpenCL and I hope someone will have a hint on what the cause might be. Following is a version of the program, reduced to the problem. I have an input int array of size 4000. In my kernel, I am doing a scan. Obviously, there are nice ways to do this in parallel, but to reproduce the problem, only one thread ...

OpenCL for commercial apps, byte code?

I'm looking at options for GPGPU work. Say I write OpenCL kernel code or GLSL shader code and embed that in my executable. There's nothing to stop somebody grep-ing the binary and stealing my hard work. I could obscure or encrypt the strings and decrypt them just-in-time, but somebody can always go in with a debugger and intercept that j...

OpenCL build program from binary

Hello, I'm trying to test the OpenCL functionality of building a program from pre-compiled binaries. So far I've managed to create the binary file, but I'm having trouble to load it. I'm trying to adapt this code for use with the C++ bindings: FILE* fp = fopen("oclLLtoUTM.ptx", "r"); fseek (fp , 0 , SEEK_END); const size_t lSize = ftell...

Does Global Work Size Need to be Multiple of Work Group Size in OpenCL?

Hello: Does Global Work Size (Dimensions) Need to be Multiple of Work Group Size (Dimensions) in OpenCL? If so, is there a standard way of handling matrices not a multiple of the work group dimensions? I can think of two possibilities: Dynamically set the size of the work group dimensions to a factor of the global work dimensions. (thi...

Real-time video encoding in DirectShow

Hello, I have developed a Windows application that captures video from an external device using DirectShow. The image resolution is 640x480 and the videos saved without compression have very huge sizes (approx. 27MB per second). My goal is to reduce this size as much as possible, so I am looking for an encoder which will allow me to co...

matlab shared c++ libraries and OpenCL

Hi, I have a project that requires lots of image processing and wanted to add GPU support to speed things up. I was wondering if i compiled my matlab into c++ shared library and called it from within OpenCL program, does that mean that the matlab code is going to be run on GPU? ...

OpenCL vs. DirectCompute?

I'm looking for comparisons between OpenCL and DirectCompute, but I haven't found anything. OpenCL's advantages of being cross-platform and having a wider range of supported GPUs don't matter to me. I'm fine with coding on Windows against DX11 GPUs only. Assuming that, what are the pros and cons of each API? I know this question was ...

GPGPU before CUDA and OpenCL

I've been reading about CUDA and OpenCL and have learned that before these frameworks developers could only use low level APIs like OPENGL and D3D. Unfortunately I haven't been able to find much information about it. Was it a widespread or commercial practice or was it just something they used in research and military labs? I'm sure so...

How can I programmatically determine a GPU's memory bus width and clock rate?

How can I programmatically determine a GPU's memory bus width and memory clock rate? I want to use these numbers to compute the maximum theoretical memory bandwidth. I'm mostly interested in NVIDIA GPUs. ...

Work-items, Work-groups and Command Queues organization and memory limit in OpenCL

Okay i have already been through most of the ati and nvidia guides to OpenCL, there are some stuff that i just want to be sure of, and some need clarification. Nothing in the documentation gives a clear cut answer. Now i have a radeon 4650, now on querying my device, i got CL_DEVICE_MAX_COMPUTE_UNITS: 8 CL_DEVICE_ADDRESS_BITS: 32...

Is possible to span an OpenCL kernel to run concurrently on CPU and GPU

Lets assume that I have a computer which has a multicore processor and a GPU. I would like to write an OpenCL program which runs on all cores of the platform. Is this possible or do I need to choose a single device on which to run the kernel? ...

Call multiple times get_global_id() vs save the result in the local variable?

It is probably a silly question, but: How expensive is it to call some get_* function in OpenCL-kernels? Is it better to save the result for future usage in some local varialbe or to call the desired function whenever it needed? Or it is platform dependent? PS I think, cuda solves it better with various threadIdx variables. ...

Are either the IPad or IPhone capable of OpenCL?

With the push towards multimedia enabled mobile devices this seems like a logical way to boost performance on these platforms, while keeping general purpose software power efficient. I've been interested in the IPad hardware as a developement platform for UI and data display / entry usage. But am curious of how much processing capabili...

Linux Function Interception for OpenCL

Hi, I'm fairly new to C so be gentle. I want to use the library interception method for Linux to replace calls to the OpenCL library with my own library. I understand that this can be done using LD_PRELOAD. So I can just re-implement the OpenCL functions as defined in the OpenCL header file within my own library which can then be linke...

clCreateSubBuffer not found oO

i can't seem to find clCreateSubBuffer in cl.h or cl.hpp (only error macro). it is mentioned in the specifications, any idea about this? or any other way to create a sub buffer? all i can think of is recreating the buffers using an incremented pointer. ...

OpenCL code that compiles on linux, doesn't compile on windows

Hi, i've been writing some OpenCL code lately on linux (ubuntu 10.4, ati catalyst 10.4 and ati sdk v2.1) and its working great on linux. When i wanted to run my code on windows, i got program build errors complaining about "this declaration has no storage class or type specifier" and then "global variable must be declared in addrSapce...

Performing a scan in OpenCL

I've been trying to get a simple scan to work for quite some time now. For small problems, the output is correct, however for large output, I get the correct results only sometimes. I've checked Apple's OpenCL example and I am basically doing the same thing (except for the bank conflicts, which I'm ignoring atm). So here's the code for t...

Sharing the GPU between OpenCL capable programs

Is there a method to share the GPU between two separate OpenCL capable programs, or more specifically between two separate processes that simultaneously both require the GPU to execute OpenCL kernels? If so, how is this done? ...

Is there an available implementation of OpenCL supporting the fp16 extension?

I am looking for an implementation of the OpenCL language that supports the cl_khr_fp16 extension. To my knowledge, no publicly available implementation supports this at present. ...