gpu-programming

How can I use the GPU as a second processor in .Net?

The question says it all really. I'm hoping that I don't have to write the code in a c++ .dll and then call it from managed code. ...

suggestions on a project in C++ / distributed systems / networks

I'd like to work on a 2-3 month long project (full time) that involves coding in C++ and is related to networks (protocol stacks). I was considering writing my own network stack but that doesn't seem as interesting. It would be great to find an idea to implement a tcp/ip-like stack for distributed system/GPUs that is better as far as net...

Is GPGPU a hack ?

Hello Folks, I had started working on GPGPU some days ago and successfully implemented cholesky factorization with good performacne and I attended a conference on High Performance Computing where some people said that "GPGPU is a Hack". I am still confused what does it mean and why they were saying it hack. One said that this is hack b...

ATI Stream SDK on ubuntu 9.04

Hello All, I have used ATI Stream SDK on windows XP SP3 and implemented one algorithm on GPU. But Now I am interested in scaling this algorithm on multiple GPUs on mutiple machines I switched to UBUNTU to use MPI ( To send messages ). I googled this but I got references for installation on SLES and RHEL but I am looking for UBUNTU 9.04...

what is the best resource for understanding why GPU is more powerful than CPU.

I was reading this site: http://www.nvidia.com/object/cuda_what_is.html and I wanted to find out some more generic information on the ideas behind GPU computing (including the history on using this for computation and benefits over using CPUs) Does anyone have any good articles that are not too difficult to digest? ...

Why do I get a CL_MEM_OBJECT_ALLOCATION_FAILURE?

I'm allocating a cl_mem buffer on a GPU and work on it, which works fine until a certain size is exceeded. In that case the allocation itself succeeds, but execution or copying does not. I do want to use the device's memory for faster operation so I allocate like: buf = clCreateBuffer (cxGPUContext, CL_MEM_WRITE_ONLY, buf_size, NULL, &c...

RAR password recovery on GPU using ATI Stream processor

Hello, I'm newbie in GPU programming , and i work on brute force RAR Password Recovery on ATI Stream Processor using brook+ language, but i see that the kernel written in brook+ language doesn't allow any calling to normal functions (except kernel functions) , my questions is : 1) how to use unrar.dll (to unrar archive files) API in thi...

How do I test OpenCL on GPU when logged in remotely on Mac?

My OpenCL program can find the GPU device when I am logged in at the console, but not when I am logged in remotely with ssh. Further, if I run the program as root in the ssh session, the program can find the GPU. The computer is a Snow Leopard Mac with a GeForce 9400 GPU. If I run the program (see below) from the console or as root, t...

printf inside CUDA __global__ function.

I am currently writing a matrix multiplication on a GPU and would like to debug my code, but since I can not use printf inside a device function, is there something else I can do to see what is going on inside that function. This my current function: __global__ void MatrixMulKernel(Matrix Ad, Matrix Bd, Matrix Xd){ int tx = threadI...

Is it possible to make Flash 100% GPU accellerated, even if outside the browser?

I'm trying to figure out the extent of flash 10's GPU acceleration capabilities. Is it possible to get 100% of your code GPU accelerated, or is only certain sandboxed functions? Even if I have to go outside the browser to get it, or to know exactly how much and what kind of GPU acceleration I can achieve inside the browser. A link to a ...

GPU programming - transfer bottlenecks

As I would like my GPU to do some of calculation for me, I am interested in the topic of measuring a speed of 'texture' upload and download - because my 'textures' are the data that GPU should crunch. I know that transfer from main memory to GPU memory is the preffered way to go, so I expect such application to be efficient only if ther...

Why aren't we programming on the GPU?

So I finally took the time to learn CUDA and get it installed and configured on my computer and I have to say, I'm quite impressed! Here's how it does rendering the Mandelbrot set at 1280 x 678 pixels on my home PC with a Q6600 and a GeForce 8800GTS (max of 1000 iterations): Maxing out all 4 CPU cores with OpenMP: 2.23 fps Running the...

How to determine if an application is using the GPU

I'm looking for a way to determine how to know whether an application is using the GPU with Objective-C. I want to be able to determine if any applications currently running on the system have work going on on the GPU (ie: a reason why the latest MacBook Pros would switch to the discrete graphics over the Intel HD graphics). I've tried ...

Not able to kill bad kernel running on NVIDIA GPU

Hi, I am in a real fix. Please help. Its urgent. I have a host process that spawns multiple host(CPU) threads (pthreads). These threads in turn call the CUDA kernel. These CUDA kernels are written by external users. So it might be bad kernels that enter infinite loop. In order to overcome this I have put a time-out of 2 mins that will ...

Recommendations for Open Source Parallel programming IDE

What are the best IDE's / IDE plugins / Tools, etc for programming with CUDA / MPI etc? I've been working in these frameworks for a short while but feel like the IDE could be doing more heavy lifting in terms of scaling and job processing interactions. (I usually use Eclipse or Netbeans, and usually in C/C++ with occasional Java, and ...

NVIDIA CUDA SDK Examples Compilation Unsupported Architecture 'compute_20'

On compilation of the CUDA SDK, I'm getting a nvcc fatal : Unsupported gpu architecture 'compute_20' My toolkit is 2.3 and on a shared system (i.e cant really upgrade) and the driver version is also 2.3, running on 4 Tesla C1060s If it helps, the problem is being called in radixsort. It appears that a few people online have had this ...

F#/"Accelerator v2" DFT algorithm implementation probably incorrect

I'm trying to experiment with software defined radio concepts. From this article I've tried to implement a GPU-parallelism Discrete Fourier Transform. I'm pretty sure I could pre-calculate 90 degrees of the sin(i) cos(i) and then just flip and repeat rather than what I'm doing in this code and that that would speed it up. But so far, ...

Is there algorithm for sorting array of strings for GPU?

Array to sort has approximately one million strings, where every string can have length up to one million characters. I am looking for any implementation of sorting algorithm for GPU. I have a block of data with size approximately 1MB and I need to construct suffix array. Now you can see how it is possible to have one million strings i...

c++ opengl: can i calculate normals in gpu? If so, how?

Hiya. I have an opengl application that loads a dxf and draws it on the screen, each time i need to calculate normals. is there a way to calculate normals in GPU instead of CPU ? if so how ? ...

Is it possible to do GPU programming if I have an integrated graphics card?

I have an HP Pavilion Laptop, it's so-called graphics card is some sort of integrated NVIDIA driver running on shared memory. To give you an idea of its capabilities, if a videogame was made in the last 5 years at a cost of more than a couple million dollars, it just won't be playable on my computer. Anyways, I was wondering if I could ...