What do you think the future of GPU as a CPU initiatives like CUDA are? Do you think they are going to become mainstream and be the next adopted fad in the industry? Apple is building a new framework for using the GPU to do CPU tasks and there has been alot of success in the Nvidias CUDA project in the sciences. Would you suggest that a ...
I have recently begun working on a project to establish how best to leverage the processing power available in modern graphics cards for general programming. It seems that the field general purpose GPU programming (GPGPU) has a large bias towards scientific applications with a lot of heavy math as this fits well with the GPU computationa...
I am interested to know whether anyone has written an application that takes advantage of a GPGPU by using, for example, nVidia CUDA. If so, what issues did you find and what performance gains did you achieve compared with a standard CPU?
...
I need to figure out how to get the data from D3D textures and surfaces back to system memory. What's the fastest way to do such things and how?
Also if I only need one subrect, how can one read back only that portion without having to read back the entire thing to system memory?
In short I'm looking for concise descriptions of how ...
I'm exploring various options for mapping common C# code constructs to C++ CUDA code for running on a GPU. The structure of the system is as follows (arrows represent method calls):
C# program -> C# GPU lib -> C++ CUDA implementation lib
A method in the GPU library could look something like this:
public static void Map<T>(this ICollec...
nVidia released their CUDA API allowing developers to utilize their graphics cards, taking advantage of the massively parallel architecture and vectorized operations. Libraries such as pyCUDA were created to allow developers of scripting languages to send selected code to the GPU.
And there has been a growing effort to design multi-ling...
This article describes a way, in C#, to allow the addition of arbitrary value types which have a + operator defined for them. In essence it allows the following code:
public T Add(T val1, T val2)
{
return val1 + val2;
}
This code does not compile as there is no guarantee that the T type has a definition for the '+' operator, but th...
With Windows 7 probably going to RTM next October (and DirectX 11 with it), would it be worth waiting for DirectX 11's explicit GPGPU features, meaning it will be cross-platform (ATI/Nvidia, not Windows/Linux/Mac/Whatever); or should I create a CUDA application now?
...
When writing CUDA applications, you can either work at the driver level or at the runtime level as illustrated on this image (The libraries are CUFFT and CUBLAS for advanced math):
I assume the tradeoff between the two are increased performance for the low-evel API but at the cost of increased complexity of code. What are the concrete...
I'm working on a C# library which offloads certain work tasks to the GPU using NVIDIA's CUDA. An example of this is adding two arrays together using extension methods:
float[] a = new float[]{ ... }
float[] b = new float[]{ ... }
float[] c = a.Add(b);
The work in this code is done on the GPU. However, I would like it to be done asynch...
I have a CUDA kernel which I'm compiling to a cubin file without any special flags:
nvcc text.cu -cubin
It compiles, though with this message:
Advisory: Cannot tell what pointer points to, assuming global memory space
and a reference to a line in some temporary cpp file. I can get this to work by commenting out some seemingly ar...
In an effort to make this an answerable question, and not just an opinion poll, I'll ask it like this:
Are there any third-party reports that compare ATI's Stream framework to NVidia's CUDA framework (ie, not from ATI or NVidia talking themselves up)?
...
I'm debating about whether to learn GP-GPU stuff, such as CUDA, or whether to put it off. My problem domain (bioinformatics) is such that it might be nice to know, since a lot of our problems do have massive parallelism, but most people in the field certainly don't know it. My question is, how difficult the API for CUDA and other GP-GP...
I've noticed that CUDA applications tend to have a rough maximum run-time of 5-15 seconds before they will fail and exit out. I realize it's ideal to not have CUDA application run that long but assuming that it is the correct choice to use CUDA and due to the amount of sequential work per thread it must run that long, is there any way t...
I'm writing a shader in GLSL and I need to pass it a certain amount of information. The only practical way to pass this information is using a 1-D texture.
I'm creating the texture and setting GL_TEXTURE_MIN_FILTER and GL_TEXTURE_MAG_FILTER to GL_NEAREST
Now from the shader I need to access the texture so I'll be able to exactly index ea...
I'm a business major, two-thirds of the way through my degree program, with a little PHP experience, having taken one introductory C++ class, and now regretting my choice of business over programming/computer science.
I am interested in learning more advanced programming; specifically C, and eventually progressing to using the CUDA arch...
I'm considering porting a large chunk of processing to the GPU using a GLSL shader. One of the immediate problems I stumbled across is that in one of the steps, the algorithm needs to maintain a list of elements, sort them and take the few largest ones (which number is dependent on the data). On the CPU this is simply done using an STL v...
CUDA, OpenCL, and the GPU options offered by the Portland Group are intriguing... Results are impresive (125-times speedup for some groups). It sounds like the next wave of GPGPU tools are poised to dominate the scientific computing world. However, I recall the same fanfare when GLSL and Cg were announced.
What ever happened to GLSL a...
Hi,
I'm a CS undergrad student and wanted to finalize my project idea soon.I am mostly interested in graphics based projects which work with help of GPUs like GPGPUS (http://en.wikipedia.org/wiki/GPGPU) or actual graphic processing using GPUs.My supervisor suggested me to look for topics related to parallel computing like in GPGPUs a...
In a CUDA kernel, I have code similar to the following. I am trying to calculate one numerator per thread, and accumulate the numerators over the block to calculate a denominator, and then return the ratio. However, CUDA is setting the value of denom to whatever value is calculated for numer by the thread in the block with the largest th...