cuda

unable to cuda code

I'm getting an error when i try to compile and build cuda code Error 1 error C2065: 'threadIdx' : undeclared identifier Error 2 error C2228: left of '.x' must have class/struct/union ...

CUDA: cudaMemcpy returns cudaErrorInvalidValue for __device__ array

When I define an array on the device (that is initialized with a "Hello" string in this example) and try to copy this to the host, I get the error code cudaErrorInvalidValue. However, from inside a kernel, the d_helloStr[] can be accessed. Referring to the CUDA programming guide chapter B.2.1, such a variable should also be accessible th...

CUDA: VFW - CODEC DLL

Hello there, my english is not very good and i am sorry for that. I have a video codec project for windows(C++), based on VFW interface. It compiles into dll, setups and runs succesfully. What i want is to add cuda kernel functions to force/improve some algorithm steps. I have installed SDK,Toolkit and Wizard. Applied CUDA rule to p...

GPU Emulator for CUDA programming without the hardware

Question: Is there an emulator for a Geforce card that would allow me to program and test CUDA without having the actual hardware? Info: I'm looking to speed up a few simulations of mine in CUDA, but my problem is that I'm not always around my desktop for doing this development. I would like to do some work on my netbook instead, but...

Sending char ** data types to device

Hi, heres sumthin ive been battling for 4 days now. I have an array of character pointers which i want to send to device. Can somebody tell me how... Heres what ive tried so far: char **a; char **b; *a[0]="Foo1"; *a[1]=="Foo2"; cudaMalloc(void**)?,sizeof(?); cudamemcpy(b,a,sizeof(?),cudaMemcpyHostToDevice); How do i pass in the param...

cuda optimization techniques

i have written a CUDA code to solve an NP-Complete problem, but the performance was not as i suspected. i know about "some" optimization techniques (using shared memroy,textures,zerocopy...) What are the most important optimization techniques Cuda programmers should know about? ...

cudaMemcpy - THE CHECK

Hi there, can somebody give me an advice in following. I am copying some data from CPU to GPU and i need to know whether its copied rigth. I can check the return code of cudeMemcpy, but it would much more better if i can print the array at GPU. int doCopyMemory(char * Input, int InputBytes) { /* Copying needed data on GPU */ ...

Massively Parallel algorithm to propagate pixels

I'm designing a CUDA app to process some video. The algorithm I'm using calls for filling in blank pixels in a way that's not unlike Conway's game of life: if the pixels around another pixels are all filled and all of similar values, the specific pixel gets filled in with the surrounding value. This iterates until all the number of pixel...

Using random numbers with GPUs

I'm investigating using nvidia GPUs for Monte-Carlo simulations. However, I would like to use the gsl random number generators and also a parallel random number generator such as SPRNG. Does anyone know if this is possible? Update I've played about with RNG using GPUs. At present there isn't a nice solution. The Mersenne Twister that c...

Are atomic operations on global memory in CUDA performed in parallel across a warp ?

Hi I need to do an atomic FP add operation on global memory on a CC 2.0 device. If the global data referenced in a warp fit into an aligned 128-byte sector, will these operations be done in parallel or will they be executed one at a time? My guess would be that they are parallel, but I am not sure of this Regards Gautham Ganapathy ...

Trouble with building Cuda programme in VS2008: LNK2019

Hi, I am having some trouble with building my programme. I am working on Windows 7 professional 32-bit with Visual Studio 2008. I have the Cuda SDK and my project is set up with all links to cudart.lib etc. My problem is when I try to build my project it returns the following errors: 1>crowdSim.obj : error LNK2019: unresolved externa...

Real-time video encoding in DirectShow

Hello, I have developed a Windows application that captures video from an external device using DirectShow. The image resolution is 640x480 and the videos saved without compression have very huge sizes (approx. 27MB per second). My goal is to reduce this size as much as possible, so I am looking for an encoder which will allow me to co...

Strange CUDA behavior in vector multiplication program

Hi, I'm having some trouble with a very basic CUDA program. I have a program that multiplies two vectors on the Host and on the Device and then compares them. This works without a problem. What's wrong is that I'm trying to test different number of threads and blocks for learning purposes. I have the following kernel: __global__ void m...

NVCC refuses to link my object files.

I am trying to compile a project by compiling object files and then linking them together, nothing fancy: hello.o : hello.h hello.cu nvcc hello.cu -c -o hello.o #... main.o : $(objs) nvcc *.o -o exec When I get to the link phase, just about every method is shown to be missing and undeclared, despite the fact that nm shows tha...

Is there build-in cross and dot products in CUDA?

Is there build-in cross and dot products in CUDA like in opencl, so cuda kernels can use it? I have nothing found in the specification until now. ...

GPGPU before CUDA and OpenCL

I've been reading about CUDA and OpenCL and have learned that before these frameworks developers could only use low level APIs like OPENGL and D3D. Unfortunately I haven't been able to find much information about it. Was it a widespread or commercial practice or was it just something they used in research and military labs? I'm sure so...

Using openMP in the cuda host code?

It it possible to use openMP pragmas in the CUDA-Files (not in the kernel code)? I will combine gpu and cpu computation. But nvvc compiler fails with "cannot find Unknown option 'openmp' ", if i am linking the porgram with a opnemp option (under linux) A wayaround is to use openMP-statments only in c/c++ files. ...

How can I programmatically determine a GPU's memory bus width and clock rate?

How can I programmatically determine a GPU's memory bus width and memory clock rate? I want to use these numbers to compute the maximum theoretical memory bandwidth. I'm mostly interested in NVIDIA GPUs. ...

define variable size on array in local memory, using CUDA

Is it somewhat possible to make a list, array, something in a device function with the size of the list/array beeing a parameter in the call… or a global variable that's initialized at call time? I would like something like one of these list to work: unsigned int size1; __device__ void function(int size2) { int list1[size1]; ...

Timeout in CUDA? / fermi / gtx465

I am using CUDA SDK 3.1 on MS VS2005 with GPU GTX465 1 GB. I have such a kernel function: __global__ void CRT_GPU_2(float *A, float *X, float *Y, float *Z, float *pIntensity, float *firstTime, float *pointsNumber) { int holo_x = blockIdx.x*20 + threadIdx.x; int holo_y = blockIdx.y*20 + threadIdx.y; float k=2.0f*3.14f/0.00000005...