gpgpu

Is possible to span an OpenCL kernel to run concurrently on CPU and GPU

Lets assume that I have a computer which has a multicore processor and a GPU. I would like to write an OpenCL program which runs on all cores of the platform. Is this possible or do I need to choose a single device on which to run the kernel? ...

Is there algorithm for sorting array of strings for GPU?

Array to sort has approximately one million strings, where every string can have length up to one million characters. I am looking for any implementation of sorting algorithm for GPU. I have a block of data with size approximately 1MB and I need to construct suffix array. Now you can see how it is possible to have one million strings i...

Executing GPGPU program through WAMP

Hi, I have a program that uses the GPU for performing certain computations. I can get the program to run correctly using the command line. But when i try to execute the same statement through PHP, i run into trouble. I'm using wamp 2.0, and I've tried the exec and proc_open functions to try to get the program to run, but even though th...

Can I call a "function-like macro" in a header file from a CUDA __global__ function???

This is part of my header file ("aes_locl.h"): . . # define SWAP(x) (_lrotl(x, 8) & 0x00ff00ff | _lrotr(x, 8) & 0xff00ff00) # define GETU32(p) SWAP(*((u32 *)(p))) # define PUTU32(ct, st) { *((u32 *)(ct)) = SWAP((st)); } . . Now from .cu file I have declared a __ global__ function and included the header file like this : #include "...

Total/texture accessible memory by DirectX/Cuda/OpenGL

Hi, Can someone please explain the difference in texture memory as used in the context of Cuda as opposed to texture memory used in the context of DirectX. Suppose a graphics card has 512 MB of advertised memory, how is it divided into constant memory/texture memory and global memory. E.g. I have a tesla card that has totalConstMem as ...

cudaMemcpy fails to copy values

I am calling cudaMemcpy and the copy returns successfully however the source values are not being copied to the destination. I wrote a similar piece using memcpy() and that works fine. What am I missing here? // host externs extern unsigned char landmask[DIMX * DIMY]; // use device constant memory for landmask unsigned char *tempmask; ...

Sparse array in CUDA or OpenCL

I have a large array (say 512K elements), GPU resident, where only a small fraction of elements (say 5K randomly distributed elements - set S) needs to be processed. The algorithm to find out which elements belong to S is very efficient, so I can easily create an array A of pointers or indexes to elements from set S. What is the most e...

OpenCL: basic questions about SIMT execution model

Some of the concepts and designs of the "SIMT" architecture are still unclear to me. From what I've seen and read, diverging code paths and if() altogether are a rather bad idea, because many threads might execute in lockstep. Now what does that exactly mean? What about something like: kernel void foo(..., int flag) { if (flag) ...

OpenCL, direct acces to host memory from gpu kernel

Hello, is there any way to allocate memory on host, that is accessible directly from gpu, without copying? like cudaHostGetDevicePointer in cuda. ...

In a GLSL fragment shader, how to access to texel at a specific mipmap level?

Hi, I am using OpenGL to do some GPGPU computations through the combination of one vertex shader and one fragment shader. I need to do computations on a image at different scale. I would like to use mipmaps since their generation can be automatic and hardware accelerated. However I can't manage to get access to the mipmap textures in th...

Texture format for cellular automata in OpenGL ES 2.0

I need some quick advice. I would like to simulate a cellular automata (from A Simple, Efficient Method for Realistic Animation of Clouds) on the GPU. However, I am limited to OpenGL ES 2.0 shaders (in WebGL) which does not support any bitwise operations. Since every cell in this cellular automata represents a boolean value, storing 1 ...

openTK vs openCL.NET

I am getting started with openCL on .NET. How is openTK compared to openCL.NET - which is better? ...

What is the point of GLSL when there is OpenCL?

Consider this the complete form of the question in the title: Since OpenCL may be the common standard for serious GPU programming in the future (among other devices programming), why not when programming for OpenGL - in a future-proof way - utilize all GPU operations on OpenCL? That way you get the advantages of GLSL, without its program...

Does GLSL utilize SLI? Does OpenCL? What is better, GLSL or OpenCL for multiple GPUs?

To what extend does OpenGL's GLSL utilize SLI setups? Is it utilized at all at the point of execution or only for end rendering? Similarly, I know that OpenCL is alien to SLI but assuming one has several GPUs, how does it compare to GLSL in multiprocessing? Since it might depend on the application, e.g. common transformation, or ray tr...

Concurrency, 4 CUDA Applications competing to get GPU resources

What would happen if there are four concurrent CUDA Applications competing for resources in one single GPU so they can offload the work to the graphic card?. The Cuda Programming Guide 3.1 mentions that there are certain methods which are asynchronous: Kernel launches Device device memory copies Host device memory copies of a memory...

Coding a CUDA Kernel that has many threads writing to the same index?

I'm writing some code for activating neural networks on CUDA, and I'm running into an issue. I'm not getting the correct summation of the weights going into a given neuron. So here is the kernel code, and I'll try to explain it a bit clearer with the variables. __global__ void kernelSumWeights(float* sumArray, float* weightArray, int...

Poor opengl image processing performance

I'm trying to do some simple image processing using opengl. Since I couldn't find any good library that does this alrdy I've been trying to do my own solution. I simply want to compose a few images on the gpu and then read them back. However the performance of my implementation seems almost equal to what it takes do on the cpu... someth...

GPGPU, OpenCL, CUDA, ATI Stream

Hello, everyone! Please tell me what technologies GPGPU exist already and which hardwares vendor's implement GPGPU? I've been reading articles on various sites from morning and I've become confused. ...

Trying to mix in openCL with CUDA in Nvidia's SDK template

Hey all, I have been having a tough time setting up an experiment where I allocate memory with CUDA on the device, take that pointer to memory on the device, use it in OpenCL, and return the results. I want to see if this is possible. I had a tough time getting a CUDA project to work so I just used Nvidia's template project in their SDK...

Bitmap conversion using GPU

I don't know whether this is the right forum. Anyway here is the question. In one of our application we display medical images and on top of them some algorithm generated bitmap. The real bitmap is a 16bit gray scale bitmap. From this we generate a color bitmap based on a look up table for eg (0-100)->green (100-200)->blue (200>above)...