cuda

nvcc -Xptxas –v compiler flag has no effect

I have a CUDA project. It consists of several .cpp files that contain my application logic and one .cu file that contains multiple kernels plus a __host__ function that invokes them. Now I would like to determine the number of registers used by my kernel(s). My normal compiler call looks like this: nvcc -arch compute_20 -link src/kerne...

Difference with CUDA Hardware Quadro 4000 Vs. GeForce 480

I'm building a workstation and want to get into some heavy CUDA programming. I don't want to go all out getting the Tesla cards and have pretty much narrowed it down to either the Quadro 4000 and the GeForce 480, but I don't really understand the difference, on paper it looks like the 480 has more cores 480 vs 256 for the 4000, but the ...

C++ 2.5 bytes (20-bit) integer

I know it's ridiculous but I need it for storage optimization. Is there any good way to implement it in C++ ? It has to be flexible enough so that I can use as normal data type e.g Vector< int20 >, operator overloading etc.. ...

how to copy 64 bit integer from host to device in cuda?

i need to copy 64 bit integer data from host to device memory. both of them are declared as "unsigned __int64" and i used cudaMemcpyToSymbol(). By checking with Parallel Nsight, the copied data is shown as a negative integer. I guess the most significant bit of the lower 4 bytes is treated as a sign bit which is not supposed to be. can a...

CUDA: Stop all other threads

Hi, I have a problem that is seemingly just solvable by enumerating all possible solutions and then finding the best. In order to do so, I devised a backtracking algorithm that enumerates and stores the best solution if found. It works fine so far. Now, I wanted to port this algorithm to CUDA. Therefore, I created a procedure that gene...

Parallelizeable jpeg like compression using only DCT, run length encoding stages, what sort of compression/performance possible?

We have to compress a ton o' (monochrome) image data and move it quickly. If one were to just use the parallelizeable stages of jpeg compression (DCT and run length encoding of the quantized results) and run it on a GPU so each block is compressed in parallel I am hoping that would be very fast and still yeild a very significant compress...

CUDA without CUDA enabled gpu

I want to setup a CUDA emulator on my ubunbu 10.04, since I don't have the hardware. Can someone provides some valuable instructions. I think Nvidia does provide an emulator, how can i set it up. so far I don't care about performance, if it's slow. Thanks. ...

how to include cutil.h in linux

I don't know how to include cutil.h in linux, i know where it is, but I don't know how to include it. Ideas please. ...

CUDA plugin dlopen

Hi, I've written a cuda plugin (dynamic library), and I have a program written in C which uses dlopen() to load this plugin. I am using dlsym() to get the functions from this plugin. For my application it is very important that any time of loading plugin the program gets a new handle with dlopen() calling (the library file may modified s...

GTX 295 vs other nvidia cards for cuda development

what is the best nvidia Video Card for cuda development. a single GTX 295 has 2 GPUs, is it possible to have 2 GTX 295 and use the 4 GPUs in my cuda code? is it better to get two 480 cards rather than two 295? would a fermi be better than both cards? ...

Cummulative array summation using OpenCL

I'm calculating the Euclidean distance between n-dimensional points using OpenCL. I get two lists of n-dimensional points and I should return an array that contains just the distances from every point in the first table to every point in the second table. My approach is to do the regular doble loop (for every point in Table1{ for every ...

CUDA Project Structure

The template and cppIntegration examples in the CUDA SDK (version 3.1) use Externs to link function calls from the host code to the device code. However, Tom's comment at http://stackoverflow.com/questions/2090974/how-to-separate-cuda-code-into-multiple-files#comment-2024913 indicates that the usage of extern is deprecated. If this the...

How do I start a CUDA app in Visual Studio 2010??

Direct Question: How do I create a simple hello world CUDA project within visual studio 2010? Background: I've written CUDA kernels. I'm intimately familiar with the .vcproj files from Visual Studio 2005 -- tweaked several by hand. In VS 2005, if I want to build a CUDA kernel, I add a custom build rule and then explicitly define the...

Converting a simple C code into a CUDA code

Hello, I'm trying to convert a simple numerical analysis code (trapezium rule numerical integration) into something that will run on my CUDA enabled GPU. There is alot of literature out there but it all seems far more complex than what is required here! My current code is: #include <stdio.h> #include <math.h> #include <stdlib.h> #defi...

GPGPU, OpenCL, CUDA, ATI Stream

Hello, everyone! Please tell me what technologies GPGPU exist already and which hardwares vendor's implement GPGPU? I've been reading articles on various sites from morning and I've become confused. ...

Can't Build a simple Cuda Program using Xcode !!!

I'm using Xcode 3.2 on Mac OS 10.6 to build a very simple HelloWorld program for CUDA but it fails to build .. any ideas !!! this is the code : #include <iostream> #include <stdio.h> #include <stdlib.h> #include <assert.h> #include <CUDA/CUDA.h> __device__ char napis_device[14]; __global__ void helloWorldOnDevice(void){ napis_d...

Nvidia cuda with Bitmap

Hi! I need help with CUDA C. I am try programming image processing tools. And i can't understand, how use Bitmap(c++) and CUDA. Help me please. P.S. sorry for my bad english. ...

Can I call cuda function calls in C++?

Is there any way I can call cuda function calls such as cudaMemcpy(...); in a .cpp file, or call it in a class method? ...

Configuring CUDA and OpenCV with Visual Studio on 64 bit machine

Hi. I have been trying to configure OpenCV2.1 and CUDA3.1 on Visual Studio 2008 on a 64bit Windows XP machine, since past 1 week. But all in vain. OpenCV alone is working fine. CUDA3.1 alone is working fine as well. I am using CUDA3.1 for 64 bit ... But for OpenCV, I am using 32 bit installation (as provided on Source Forge) - Possible ...

MotherBoard for Nvidia 480 GTX With Cuda

I am going to use cuda to develop programs on GPUs. My plan is to place 3 Nvidia 480 on a single motherboard... is this possible? if yes, then what motherboard do you recommend? ...