cuda

CUDA: accumulate data into a large histogram of floats

I'm trying to think of a way to implement the following algorithm using CUDA: Working on a large volume of voxels, for each voxel I calculate an index i and a value c. after the calculation I need to perform histogram[i] += c c is a float value and the histogram can have up to 15,000 bins. I'm looking for a way to implement this effici...

Can I program Nvidia's CUDA using only Python or do I have to learn C?

I guess the question speaks for itself. I'm interested in doing some serious computations but am not a programmer by trade. I can string enough python together to get done what I want. But can I write a program in python and have the GPU execute it using CUDA? Or do I have to use some mix of python and C? The examples on Klockner's ...

cuda libraries for search operations

Is there any CUDA library that performs comparison/search operation. ...

Not able to kill bad kernel running on NVIDIA GPU

Hi, I am in a real fix. Please help. Its urgent. I have a host process that spawns multiple host(CPU) threads (pthreads). These threads in turn call the CUDA kernel. These CUDA kernels are written by external users. So it might be bad kernels that enter infinite loop. In order to overcome this I have put a time-out of 2 mins that will ...

What kind of data processing problems would CUDA help with?

Hi, I've worked on many data matching problems and very often they boil down to quickly and in parallel running many implementations of CPU intensive algorithms such as Hamming / Edit distance. Is this the kind of thing that CUDA would be useful for? What kinds of data processing problems have you solved with it? Is there really an upl...

Performance differences between different CUDA SDK's?

If I want to re-write my application so that it leverages the power of nVidia's CUDA SDK, are there any differences at all in runtime performance between the different SDK offerings: C++, Java, Python? Is there any difference at all between these 3 SDK's, besides the obvious language being used? ...

Porting a project to OpenGL3

Hi everyone, I'm working on a C++ cross-platform OpenGL application (Windows, Linux and MacOS) and I am wondering if some of you could share some advices on porting a large application to OpenGL 3. The reason I am looking into OpenGL 3 is because I think we could benefit a lot from using the new "Sync objects". Nvidia has supported such...

CUDA linking error - Visual Express 2008 - nvcc fatal due to (null) configuration file

Hi, I've been searching extensively for a possible solution to my error for the past 2 weeks. I have successfully installed the Cuda 64-bit compiler (tools) and SDK as well as the 64-bit version of Visual Studio Express 2008 and Windows 7 SDK with Framework 3.5. I'm using windows XP 64-bit. I have confirmed that VSE is able to compile i...

Questions for my CUDA assignment on imageDenoising

This is the post I post days before, and I loss the account and registered another one I am trying to modify the imageDenosing class in CUDA SDK, I need to repeat the filter many time incase to capture the time. But my code doesn't work properly. //start __global__ void F1D(TColor *image,int imageW,int imageH, TColor *buffer) { con...

how to make a CUDA Histogram kernel?

Hi all, I am writing a CUDA kernel for Histogram on a picture, but I had no idea how to return a array from the kernel, and the array will change when other thread read it. Any possible solution for it? __global__ void Hist( TColor *dst, //input image int imageW, int imageH, int*data ){ const int ix = blockDim.x * blo...

CUDA: injecting my own PTX function?

I would like to be able to use a feature in PTX 1.3 which is not yet implemented it the C interface. Is there a way to write my own function in PTX and inject into an existing binary? The feature I'm looking for is getting the value of %smid ...

How to install Nvidia Parallel NSight (Nexus) for VS2010 without having installed VS2008?

Is there a way to install Parallel NSight and use it with Visual Studio 2010 without having VS2008 SP1 installed? The setup checks if VS2008 is installed and won't continue if not. I know there is no official support for VS2010, but I found on a forum a small application that can integrate Nexus into VS2010 and it seems to work. Thanks i...

My kernel only works in block (0,0)

I am trying to write a simple matrixMultiplication application that multiplies two square matrices using CUDA. I am having a problem where my kernel is only computing correctly in block (0,0) of the grid. This is my invocation code: dim3 dimBlock(4,4,1); dim3 dimGrid(4,4,1); //Launch the kernel; MatrixMulKernel<<<dimGrid,dimBlock>>>(Md...

How can I inprove this function under CUDA?

Hi, I can inprove this function under CUDA? What this function does is: Given a min and max, ELM1 and ELM, check if any three numbers of array ans[6] are found in any row, from min to max, in array D1,D2,D3,D4,D5,D6, if found return 1 I tried any other way, like looping, or-ing, and-ing, replacing goto with flag etc. etc. but this se...

Turn off VisualAssit for *.cl, *.cu and *.cuh

How can I define, which filetypes should be work by the VisualAssit in Visual Studio 2010? I don't like, how this tool works with openCL and cuda files, therefore i would like to turn off it for thie file types (oherwise it highlights 1000 errors). thx. ...

How to activate nVidia cards programmatically on new MacBookPros for CUDA programming?

The new MacBookPros come with two graphic adapters, the Intel HD Graphics, and the NVIDIA GeForce GT 330M. OS X switches back and forth between them, depending on the workload, detection of an external monitor, or activation of Rosetta. I want to get my feet wet with CUDA programming, and unfortunately the CUDA SDK doesn't seem to take...

Simultaneous launch of Multiple Kernels using CUDA for a GPU

Is it possible to launch two kernels that do independent tasks, simultaneously. For example if I have this Cuda code // host and device initialization ....... ....... // launch kernel1 myMethod1 <<<.... >>> (params); // launch kernel2 myMethod2 <<<.....>>> (params); Assuming that these kernels are independent, is there a facility to...

How to convert a C++ program that uses CUDA into MEX

For work, I am converting the Image Denoising program that comes with the CUDA SDK into a MATLAB program. As far as I know, I have made all the necessary changes required by MATLAB, but when I try to call mex on it, MATLAB returns a bunch of linkage errors that I have no idea how to fix. If anyone has any suggestions on what I might be d...

NVIDIA CUDA SDK Examples Compilation Unsupported Architecture 'compute_20'

On compilation of the CUDA SDK, I'm getting a nvcc fatal : Unsupported gpu architecture 'compute_20' My toolkit is 2.3 and on a shared system (i.e cant really upgrade) and the driver version is also 2.3, running on 4 Tesla C1060s If it helps, the problem is being called in radixsort. It appears that a few people online have had this ...

How to treat 64-bit words on a CUDA device?

Hi, I'd like to handle directly 64-bit words on the CUDA platform (eg. uint64_t vars). I understand, however, that addressing space, registers and the SP architecture are all 32-bit based. I actually found this to work correctly (on my CUDA cc1.1 card): __global__ void test64Kernel( uint64_t *word ) { (*word) <<= 56; } but I don'...