gpgpu

Feasability of GPU as a CPU?

What do you think the future of GPU as a CPU initiatives like CUDA are? Do you think they are going to become mainstream and be the next adopted fad in the industry? Apple is building a new framework for using the GPU to do CPU tasks and there has been alot of success in the Nvidias CUDA project in the sciences. Would you suggest that a ...

How well do common programming tasks translate to GPUs?

I have recently begun working on a project to establish how best to leverage the processing power available in modern graphics cards for general programming. It seems that the field general purpose GPU programming (GPGPU) has a large bias towards scientific applications with a lot of heavy math as this fits well with the GPU computationa...

Have you successfully used a GPGPU?

I am interested to know whether anyone has written an application that takes advantage of a GPGPU by using, for example, nVidia CUDA. If so, what issues did you find and what performance gains did you achieve compared with a standard CPU? ...

Doing readback from Direct3D textures and surfaces

I need to figure out how to get the data from D3D textures and surfaces back to system memory. What's the fastest way to do such things and how? Also if I only need one subrect, how can one read back only that portion without having to read back the entire thing to system memory? In short I'm looking for concise descriptions of how ...

Turning C# methods into C++ methods

I'm exploring various options for mapping common C# code constructs to C++ CUDA code for running on a GPU. The structure of the system is as follows (arrows represent method calls): C# program -> C# GPU lib -> C++ CUDA implementation lib A method in the GPU library could look something like this: public static void Map<T>(this ICollec...

GPGPU VM's: Any open source projects to port virtual machines onto graphics processing units?

nVidia released their CUDA API allowing developers to utilize their graphics cards, taking advantage of the massively parallel architecture and vectorized operations. Libraries such as pyCUDA were created to allow developers of scripting languages to send selected code to the GPU. And there has been a growing effort to design multi-ling...

Operations on arbitrary value types

This article describes a way, in C#, to allow the addition of arbitrary value types which have a + operator defined for them. In essence it allows the following code: public T Add(T val1, T val2) { return val1 + val2; } This code does not compile as there is no guarantee that the T type has a definition for the '+' operator, but th...

Should I create CUDA apps now, or wait for DirectX 11?

With Windows 7 probably going to RTM next October (and DirectX 11 with it), would it be worth waiting for DirectX 11's explicit GPGPU features, meaning it will be cross-platform (ATI/Nvidia, not Windows/Linux/Mac/Whatever); or should I create a CUDA application now? ...

CUDA Driver API vs. CUDA runtime

When writing CUDA applications, you can either work at the driver level or at the runtime level as illustrated on this image (The libraries are CUFFT and CUBLAS for advanced math): I assume the tradeoff between the two are increased performance for the low-evel API but at the cost of increased complexity of code. What are the concrete...

How to block until an asynchronous job finishes

I'm working on a C# library which offloads certain work tasks to the GPU using NVIDIA's CUDA. An example of this is adding two arrays together using extension methods: float[] a = new float[]{ ... } float[] b = new float[]{ ... } float[] c = a.Add(b); The work in this code is done on the GPU. However, I would like it to be done asynch...

CUDA memory troubles

I have a CUDA kernel which I'm compiling to a cubin file without any special flags: nvcc text.cu -cubin It compiles, though with this message: Advisory: Cannot tell what pointer points to, assuming global memory space and a reference to a line in some temporary cpp file. I can get this to work by commenting out some seemingly ar...

ATI Stream compared to NVidia/CUDA

In an effort to make this an answerable question, and not just an opinion poll, I'll ask it like this: Are there any third-party reports that compare ATI's Stream framework to NVidia's CUDA framework (ie, not from ATI or NVidia talking themselves up)? ...

How ugly is the API for GP-GPU?

I'm debating about whether to learn GP-GPU stuff, such as CUDA, or whether to put it off. My problem domain (bioinformatics) is such that it might be nice to know, since a lot of our problems do have massive parallelism, but most people in the field certainly don't know it. My question is, how difficult the API for CUDA and other GP-GP...

How do you get around the maximum CUDA run-time?

I've noticed that CUDA applications tend to have a rough maximum run-time of 5-15 seconds before they will fail and exit out. I realize it's ideal to not have CUDA application run that long but assuming that it is the correct choice to use CUDA and due to the amount of sequential work per thread it must run that long, is there any way t...

How to index a texture as a discrete lookup table from a shader?

I'm writing a shader in GLSL and I need to pass it a certain amount of information. The only practical way to pass this information is using a 1-D texture. I'm creating the texture and setting GL_TEXTURE_MIN_FILTER and GL_TEXTURE_MAG_FILTER to GL_NEAREST Now from the shader I need to access the texture so I'll be able to exactly index ea...

Advice for a C, CUDA, & ANN Newbie?

I'm a business major, two-thirds of the way through my degree program, with a little PHP experience, having taken one introductory C++ class, and now regretting my choice of business over programming/computer science. I am interested in learning more advanced programming; specifically C, and eventually progressing to using the CUDA arch...

Quick sort in GLSL?

I'm considering porting a large chunk of processing to the GPU using a GLSL shader. One of the immediate problems I stumbled across is that in one of the steps, the algorithm needs to maintain a list of elements, sort them and take the few largest ones (which number is dependent on the data). On the CPU this is simply done using an STL v...

CUDA, OpenCL, PGI, etc.... but what happened to GLSL and Cg?

CUDA, OpenCL, and the GPU options offered by the Portland Group are intriguing... Results are impresive (125-times speedup for some groups). It sounds like the next wave of GPGPU tools are poised to dominate the scientific computing world. However, I recall the same fanfare when GLSL and Cg were announced. What ever happened to GLSL a...

ideas for graphic based projects using GPUs?

Hi, I'm a CS undergrad student and wanted to finalize my project idea soon.I am mostly interested in graphics based projects which work with help of GPUs like GPGPUS (http://en.wikipedia.org/wiki/GPGPU) or actual graphic processing using GPUs.My supervisor suggested me to look for topics related to parallel computing like in GPGPUs a...

CUDA shared memory array - odd behavior

In a CUDA kernel, I have code similar to the following. I am trying to calculate one numerator per thread, and accumulate the numerators over the block to calculate a denominator, and then return the ratio. However, CUDA is setting the value of denom to whatever value is calculated for numer by the thread in the block with the largest th...