cuda

error in CUDA compilation

I'm getting this error while trying to run sample codes in CUDA SDK. I have CUDA 2.3 and Visual studio 2008 LINK : fatal error LNK1181: cannot open input file 'cutil32D.lib' Any pointers how to solve this? ...

How to: Parallel Reduction of many unequally sized arrays in CUDA?

Hi there I am wondering if anyone could suggest the best approach to computing the mean / standard deviation of a large number of relatively small but differently sized arrays in CUDA? The parallel reduction example in the SDK works on a single very large array and it seems the size is conveniently a multiple of the number of threads p...

Switching threads for MFC application cleanup

I'm trying to clean up specific memory objects created by a specific thread (hence only accessible to that thread). The only way for me to achieve that is to switch to that particular thread when freeing that memory block. This is how I allocated the specific memory context: This is what I attempted to do: I have originally added t...

Are loops with and without parenthesis handled differently in C?

I was stepping through some C/CUDA code in the debugger, something like: for(uint i = threadIdx.x; i < 8379; i+=256) sum += d_PartialHistograms[blockIdx.x + i * HISTOGRAM64_BIN_COUNT]; And I was utterly confused because the debugger was passing by it in one step, although the output was correct. I realised that when I put curly ...

Generate all combinations of a char array inside of a CUDA __device__ kernel

Hi, I need help please. I started to program a common brute forcer / password guesser with CUDA (2.3 / 3.0beta). I tried different ways to generate all possible plain text "candidates" of a defined ASCII char set. In this sample code I want to generate all 74^4 possible combinations (and just output the result back to host/stdout). $...

CUDA - Better Occupancy vs Less Global Memory Access?

Hey My CUDA code must work with (reduce to mean/std, calculate histogram) 4 arrays, each 2048 floats long and already stored in the device memory from previous kernels. It is generally advised to launch at least as many blocks as I have multiprocessors. In this case however, I can load each of these arrays into the shared memory of a ...

CUDA allocating array of arrays

Hi, I have some trouble with allocate array of arrays in CUDA. void ** data; cudaMalloc(&data, sizeof(void**)*N); // allocates without problems for(int i = 0; i < N; i++) { cudaMalloc(data + i, getSize(i) * sizeof(void*)); // seg fault is thrown } What did I wrong? ...

Why won't OpenCV compile in NVCC?

Hi there I am trying to integrate CUDA and openCV in a project. Problem is openCV won't compile when NVCC is used, while a normal c++ project compiles just fine. This seems odd to me, as I thought NVCC passed all host code to the c/c++ compiler, in this case the visual studio compiler. The errors I get are? c:\opencv2.0\include\open...

Why do books on concurrent programming always ignore data parallelism?

There has been a significant shift towards data-parallel programming via systems like OpenCL and CUDA over the last few years, and yet books published even within the last six months never even mention the topic of data-parallel programming. It's not suitable for every problem, but it seems that there is a significant gap here that isn'...

How do I run MATLAB code on the GPU using CUDA?

I want to run MATLAB code on the GPU using NVIDIA's CUDA. I found a couple of 3rd-party engines: Jacket GPUMat Would anyone recommend these or are there better ones out there? Any tips or suggestions? ...

CUDA : error C2491: 'log1p' : definition of dllimport function not allowed

I am tryint to integrate CUDA in an existing project, in which several libs (DLLs) are created. I started with a very simple kernel that computes a dot product : // dotProd_kernel.cu __global__ void dotProd( double* result, double* vec1, double* vec2) { int i = threadIdx.x; result[i] = vec1[i] * vec2[i]; } This kernel is called ...

CUDA & Visual C++ & Windows Forms Applications

I'm using Microsoft Visual C++ 2008 Express Edition and I have to work with CUDA technology. I've understood how to work with it creating console applications. But I have no idea how to make it working in win32 applications with forms (dialogs, buttons, labels, etc.) Any idea? ...

CUDA bounds checker?

Is there a tool equivalent to a bounds checker or purify or valgrind for CUDA? I'm basically looking for something that might tell me if I'm reading or writing outside of allocated memory. ...

CUDA: Getting linking error only in device emulation mode

I am compiling a dll which goes just fine unless I use the -deviceemu mode. In this case I get several of the following linking errors: CUDAKernel_ColourHist.obj : error LNK2019: unresolved external symbol ___cudaMutexOperation@4 referenced in function ___uAtomicAdd 1>CUDAKernel_1.obj : error LNK2001: unresolved ext...

Cannot compile CUDA app in VS 2008

I't trying to work with CUDA with Visual Studio 2008 Professional. I'm using Windows 7 64 bit and I've done following steps: - Downloaded and installed CUDA Driver, Toolkit ans SDK. I can run any example from SDK. - Downloaded and installed CUDA VS Wizard When I'm trying to create a CUDA Win App I've got the following compile error: Er...

Which IDE should I use for this art project?

I have an art project that will require processing a live video feed to use as the basis of a particle system, which will be rendered using OpenGL and projected on a stage. I have a CUDA enabled graphics card, and I was thinking it would be nice to be able to use that for the image and particle system processing. This project only needs...

Why does CUDA.rules have two identical command lines

The commandline for CUDA.rules file is: echo [CompilerPath] [Keep] [CInterleavedPTX] [ExtraNvccOptions] [Arch] -ccbin "$(VCInstallDir)bin" [Emulation] [FastMath] [Defines] -Xcompiler "/EHsc [Warning] /nologo [Optimization] /Zi [RuntimeChecks] [Runtime] [TypeInfo] [ExtraCppOptions]" [Include] [MaxRegCount] [PtxAsOption...

CUDA fallback to CPU?

I have a CUDA application that on one computer (with a GTX 275) works fine and on another, with a GeForce 8400 works about 100 times slower. My suspicion is that there is some kind of fallback that makes the code actually run on the CPU rather than on the GPU. Is there a way to actually make sure that the code is running on the GPU? I...

3d convolution in c++

Hello, I'm looking for some source code implementing 3d convolution. Ideally, I need C++ code or CUDA code. I'd appreciate if anybody can point me to a nice and fast implementation :-) Cheers ...

Interview questions on CUDA Programming?

Hi! I have an interview coming up in a week's time for an entry level position that involves programming in CUDA (hopefully with C). I was wondering if anybody can suggest some interview questions that I can expect during the interview. I have gone through the official programming guide but I'm not all that convenient right now. Tha...