questions about cuda

Gigaflops of a processor

I discovered my computer has NVIDIA CUDA Technology and I want measure the power of processing, in CPU and GPU. Instead of searching for a program to do this, I want have a deeper understanding of how it works. What kind of code (C/C++) I need? ...

c

cpu

cuda

gpu

flops

Tesla double precision

I am looking for the information, how double precision is hardware implemented in the tesla gpu . I have read, that two stream processors are working on the single double value, but i didn't found any official paper from nvidia. Thanks in advance. PPS Why most GPU are computing with only single precision (because colors can be stored as...

double

cuda

opencl

cudaMemcpy fails to copy values

I am calling cudaMemcpy and the copy returns successfully however the source values are not being copied to the destination. I wrote a similar piece using memcpy() and that works fine. What am I missing here? // host externs extern unsigned char landmask[DIMX * DIMY]; // use device constant memory for landmask unsigned char *tempmask; ...

Visual Studio Linking Problem with Cuda

I am doing some programming with nVidia's CUDA C. I am using Visual Studio 2008 as my development environment and I am having some troubles with some linking and I am wondering if someone knows a way to fix it or has had the same problem and could offer a solution. My program is made up of 3 files. 1 header file (stuff.h), 1 C source fi...

visual-studio

linker

cuda

Sparse array in CUDA or OpenCL

I have a large array (say 512K elements), GPU resident, where only a small fraction of elements (say 5K randomly distributed elements - set S) needs to be processed. The algorithm to find out which elements belong to S is very efficient, so I can easily create an array A of pointers or indexes to elements from set S. What is the most e...

cuda

opencl

gpgpu

dr dobbs cuda (reversing arrays) tutorial

I was reading Supercomputing for the Masses: Part 5 on Dr.Dobb's and I have a question concerning the author's code for (fast) reversing arrays. I understand the need to use shared memory but I didn't get the performance gain in the code of reverseArray_multiblock_fast.cu In reverseArray_multiblock_fast.cu an array element is trans...

cuda

shared-memory

CUDA and STL vector

Having just learned that many cpp features (including the stl vector class) do not work in cu files. Even when using them in the host code. Since I have to use a C++ class which uses STL I cannot compile my CU file which invokes the kernel. (I don't use any STL features in the CU file, but I think the include is the problem.) I tried t...

stl

cuda

cuda offset device pointer in host code

I first process a matrix in cublas, I have already sent it to device and I want to process some column vector of the matrix, still use cublas function. I first try using pointer arithmetic operation to offset the device pointer from host, but it seems doesn't work. Is there any way I can process vector in matrix without copying it back t...

pointers

cuda

device

float vs int in cuda

is it better to use a float instead of an int in cuda? does a float decrease bank conflicts and insures coalescence? or it has nothing to do with this? ...

cuda

coalescence vs bank conflicts (Cuda)

what is the difference between coalescence and bank conflicts when programming with cuda? is it only that coalescence happens at the global memory while bank conflicts at the shared memory? should i worry about coalescence, if i have a >1.2 supported gpu? does it handle coalescence by itself? ...

cuda

Streaming multiprocessors, Blocks and Threads (CUDA)

What is the relationship between a CUDA core, a streaming multiprocessor and the CUDA model of blocks and threads? What gets mapped to what and what is parallelized and how? and what is more efficient, maximize the number of blocks or the number of threads? Thanks, ExtremeCoder My current understanding is that there are 8 cuda core...

CUDA Pointer Dereferencing Issue

I am developing a program using cuda sdk and 9600 1 GB NVidia Card . In this program 0)A kernel passes a pointer of 2D int array of size 3000x6 in its input arguments. 1)The kenel has to sort it upto 3 levels (1st, 2nd & 3rd Column). 2)For this purpose, the kernel declares an array of int pointers of size 3000. 3)The kernel then ...

c

cuda

CUDA x64 + openCV 2.1

the previous tutorials have not shown anybody else having this problem: compiling openCV and CUDA projects in vs2008 in windows 7 x64. but i have been stuck on it for over a week. Zero problems building openCV samples and my own code and CUDA within their own projects. I cannot get them to build in a single project together no matter ...

opencv

cuda

CUDA different results on different platforms

I've written a small CUDA program on my macbook pro and now tried it out on my Linux box and get different results. In order to ensure correctness, I wrote unit tests: An array of floats, which contains the values to check, is copied to the device and then back. Worst thing is that it sometimes returns different values on Linux (and ver...

cuda

OpenCL Events and Command Queues

I'm working on translating a CUDA application (this if you must know) to OpenCL. The original application uses the C-style CUDA API, with a single stream just to avoid the automatic busy-wait when reading the results. Now I notice that OpenCL command queues look a lot like CUDA streams. But in the device read command, and likewise in ...

How to measure the execution time of every block when using CUDA?

clock() is not accurate enough. ...

cuda

gpu

parallel-programming

Max number of streams in CUDA?

Is there a maximum number of streams that can be created in CUDA? To clarify I mean CUDA streams as in the stream that allows you to execute kernels and memory operations. ...

c

streams

cuda

Is there any effective implement of the solution for sparse matrix linear equation using CUDA?

Is there any effective implement of the solution for sparse matrix linear equation using CUDA? ...

cuda

sparse-matrix

CUDA - Bus Error

I have been working with CUDA for a while now and started to have bus errors reported on the first attempt to malloc any data to the GPU after working for a short period of time. The only way that i have found to fix this is to restart the machine. The memory should be cleared up automatically but it does not seem to happen if the ap...

opencv

cuda

cudamalloc

CUDA C Research / Project Ideas

Over the summer, I started to learn CUDA C because the nVIDIA performance claims were simply unbelievable. This past week, I started another semester of my undergrad studies. My major is computer science. One of the classes I am taking this semester is undergrad research and want to further practice with CUDA C. Does anyone have an...