gpu

Programming GPU to control DVI output

I have a NVIDIA GeForce 8400GS graphics card which has a DVI output and I would like to take a video or series of frames and display them as the DVI output for WUXGA (1,920 × 1,200) @ 120 Hz with GTF (2 x 154 MHz), which is a possible display mode for DVI according to the wikipedia article. I want to do this because I want a high frame r...

Using random numbers with GPUs

I'm investigating using nvidia GPUs for Monte-Carlo simulations. However, I would like to use the gsl random number generators and also a parallel random number generator such as SPRNG. Does anyone know if this is possible? Update I've played about with RNG using GPUs. At present there isn't a nice solution. The Mersenne Twister that c...

Does Global Work Size Need to be Multiple of Work Group Size in OpenCL?

Hello: Does Global Work Size (Dimensions) Need to be Multiple of Work Group Size (Dimensions) in OpenCL? If so, is there a standard way of handling matrices not a multiple of the work group dimensions? I can think of two possibilities: Dynamically set the size of the work group dimensions to a factor of the global work dimensions. (thi...

Would it be possible for a JIT compiler to utilize GPU for certain operations behind the scenes?

Feel free to correct me if any part of my understanding is wrong. My understanding is that GPUs offer a subset of the instructions that a normal CPU provides but executes them much faster. I know there are ways to utilize GPU cycles for non-graphical purpose, but it seems like (in theory) a language that's Just In Time compiled could d...

GPU Chipset Detection

Seeking most efficient method for retrieving the GPU model in Objective-C or Carbon. I want to avoid using system_profiler because it is slow, but if it comes down to that I am willing to use it, but I wanna exhaust other options first. ...

Is there algorithm for sorting array of strings for GPU?

Array to sort has approximately one million strings, where every string can have length up to one million characters. I am looking for any implementation of sorting algorithm for GPU. I have a block of data with size approximately 1MB and I need to construct suffix array. Now you can see how it is possible to have one million strings i...

Can I share cuda GPU device memory between host processes?

Is it possible to have two or more linux host processes that can access the same device memory? I have two processes streaming high data rate between them and I don't want to bring the data back out of the GPU to the host in process A just to pass it to process B who will memcpy h2d back into the GPU. Combining the multiple processes in...

How to access directly to VGA

Hi guys. As most of you know CPUs are not well designed to do floating point calculation in contrast to GPUs. I am wondering how to use GPU's power without any abstraction layer or driver. Can I program for a GPU using assembly, C, C++ language (I mean how?). Although assembly seems to help me access the gpu directly, C/C++ are likely t...

CUDA Add Rows of a Matrix

Hi, I'm trying to add the rows of a 4800x9600 matrix together, resulting in a matrix 1x9600. What I've done is split the 4800x9600 into 9,600 matrices of length 4800 each. I then perform a reduction on the 4800 elements. The trouble is, this is really slow... Anyone got any suggestions? Basically, I'm trying to implement MATLAB's su...

How to handle multitasking in OpenGL ES based apps on iOS 4?

I'm watching a WWDC video (session 105) that's talking about multitasking with iOS 4. Something interesting was just mentioned: "any GPU usage while your app is in either of the background states results in automatic termination of the app. This includes any calls to OpenGL." How does one handle this "requirement" if the ...

GPU Render onto sphere

Hello, I am trying to write an optimized code that renders a 3D scene using OpenGL onto a sphere and then displays the unwrapped sphere on the screen ie producing a planar map of a purely reflective sphere. In math terms, I would like to produce a projection map where the x axis is the polar angle and y axis is the azimuth. I am trying ...

Java GPU programming

Hi, Is it possible to do GPU programming in Java ? I mean without using native libraries. And how much of a performance improvement can one expect when we switch over to gpu's ? Edit: I am not looking at game programming, I want to do hard core number crunching. ...

Total/texture accessible memory by DirectX/Cuda/OpenGL

Hi, Can someone please explain the difference in texture memory as used in the context of Cuda as opposed to texture memory used in the context of DirectX. Suppose a graphics card has 512 MB of advertised memory, how is it divided into constant memory/texture memory and global memory. E.g. I have a tesla card that has totalConstMem as ...

Gigaflops of a processor

I discovered my computer has NVIDIA CUDA Technology and I want measure the power of processing, in CPU and GPU. Instead of searching for a program to do this, I want have a deeper understanding of how it works. What kind of code (C/C++) I need? ...

cudaMemcpy fails to copy values

I am calling cudaMemcpy and the copy returns successfully however the source values are not being copied to the destination. I wrote a similar piece using memcpy() and that works fine. What am I missing here? // host externs extern unsigned char landmask[DIMX * DIMY]; // use device constant memory for landmask unsigned char *tempmask; ...

X86 Assembly - accessing a chip

Dear all, Lets say that my GPU includes a chip called ADT7473. I am interested in receiving information from this chip about the temperature of my card. My question is, how to access this chip? is that accomplished using the IN/OUT instructions? EDIT: I might add those lines found in the chip's documentation : Table 18. Temperatu...

How to measure the execution time of every block when using CUDA?

clock() is not accurate enough. ...

Numerical Error in simple CUDA code

I just started experimenting cuda with the following cude #include "macro.hpp" #include <algorithm> #include <iostream> #include <cstdlib> //#define double float //#define double int int RandomNumber(){return static_cast<double>(rand() % 1000);} __global__ void sum3(double const* a, double const* b, double c...

cuda app on part of the cards

I've got a Nvidia Tesla s2050; a host with a nvidia quadro card.CentOS 5.5 with CUDA 3.1 When i run cuda app, i wanna use 4 Tesla c-2050, but not including quadro on host in order not to lagging the whole performance while split the job by 5 equally.any way to implement this? ...

Coding a CUDA Kernel that has many threads writing to the same index?

I'm writing some code for activating neural networks on CUDA, and I'm running into an issue. I'm not getting the correct summation of the weights going into a given neuron. So here is the kernel code, and I'll try to explain it a bit clearer with the variables. __global__ void kernelSumWeights(float* sumArray, float* weightArray, int...