cuda

CUDA: cudaMemcpy only works in emulation mode.

I am just starting to learn how to use CUDA. I am trying to run some simple example code: float *ah, *bh, *ad, *bd; ah = (float *)malloc(sizeof(float)*4); bh = (float *)malloc(sizeof(float)*4); cudaMalloc((void **) cudaMalloc((void **) ... initialize ah ... /* copy array on device */ cudaMemcpy(ad,ah,sizeof(float)*N,cudaMemcpyHostTo...

How can I use GLUT with CUDA on MACOSX?

Hi, I'm having problems compiling a CUDA program that uses GLUT on MacOsX. Here is the command line I use to compile the source: nvcc main.c -o main -Xlinker "-L/System/Library/Frameworks/OpenGL.framework/Libraries -lGL -lGLU" "-L/System/Library/Frameworks/GLUT.framework" And here is the errors I get: Undefined symbols: "_glutInitW...

Can I use C++ header files in kernel part of a CUDA code?

I want to compare two strings in a kernel function. Can I use strcomp in file? Generally, can I use C++ libraries in my CUDA code? ...

CUDA: Memory copy to GPU 1 is slower in multi-GPU

My company has a setup of two GTX 295, so a total of 4 GPUs in a server, and we have several servers. We GPU 1 specifically was slow, in comparison to GPU 0, 2 and 3 so I wrote a little speed test to help find the cause of the problem. //#include <stdio.h> //#include <stdlib.h> //#include <cuda_runtime.h> #include <iostream> #include <f...

Upper bound for custom rand48

Hi everyone, I'm using a custom random number function rand48 in CUDA. The function does not allow an upperbound to be set, but I require the output to be between 0 and 1. I guess I'm missing something but how would I convert the output to be between 0 and 1, the length of the number can change e.g. 697135872 would need to be divided ...

Why aren't we programming on the GPU?

So I finally took the time to learn CUDA and get it installed and configured on my computer and I have to say, I'm quite impressed! Here's how it does rendering the Mandelbrot set at 1280 x 678 pixels on my home PC with a Q6600 and a GeForce 8800GTS (max of 1000 iterations): Maxing out all 4 CPU cores with OpenMP: 2.23 fps Running the...

Transform OpenCV image data type to Devil image format and vice-verca

I want to use a CUDA-enabled SIFT library but I am using the OpenCV driver to get images from the webcam? The Cuda library is using the Devil Library for image data types. Should I transofrm the images from OpenCV data types to Devil? Or Should I use another method for getting images from the webcam[devil compatible data types]? Thanks f...

cmake, gcc, cuda and -m32 wtf

Hi all I figured out that CUDA does not work in 64bit mode on my mac (or couldn't get it running so far). Therefore I decided to compile everything for 32bit. I use cmake 2.8 and added the following options add_definitions(-Wall -m32) set(CUDA_64_BIT_DEVICE_CODE OFF) set(CMAKE_MODULE_LINKER_FLAGS -m32) However when it tries to link...

How to return a single variable from a CUDA kernel function?

I have a CUDA search function which calculate one single variable. How can I return it back. global void G_SearchByNameID(node* Node, long nodeCount, long start,char* dest, long answer){ answer = 2; } cudaMemcpy(h_answer, d_answer, sizeof(long), cudaMemcpyDeviceToHost); cudaFree(d_answer); for both of these lines I get this error...

CUDA: How to reuse kernels in multiple files (for unit testing)

How can I go about reusing the same kernel without getting fatal linking errors due to defining the symbol multiple times In Visual Studio I get "fatal error LNK1169: one or more multiply defined symbols found" My current structure is as follows: Interface.h has an extern interface to a C function: myCfunction() (ala the C++ integrat...

CUDA compare arrays

Hello. Trying to make an app that will compare 1-to-multiple bitmaps. there is one reference bitmap and multiple other bitmaps. Result from each compare should be new bitmap with diffs. Maybe comparing bitmaps rather as textures than arrays? My biggest problem is making kernel accept more than one input pointer, and how to compare the da...

Best approach for GPGPU/CUDA/OpenCL in Java?

General-purpose computing on graphics processing units (GPGPU) is a very attractive concept to harness the power of the GPU for any kind of computing. I'd love to use GPGPU for image processing, particles, and fast geometric operations. Right now, it seems the two contenders in this space are CUDA and OpenCL. I'd like to know: Is Op...

Can CUDA results be stored in an OpenGL accessible texture?

Can CUDA be used to generate OpenGL textures? I know it can be done by reading the CUDA results back into system memory, and then loading that into a texture... But I'd like to find a way to save this copy... Can CUDA be used to generate textures? ...

Bind texture with pinned mapped memory in CUDA

I was trying to bind a host memory that was mapped for zero-copy to a texture, but it looks like it isn't possible. Here is a code sample: float* a; float* d_a; cudaSetDeviceFlags(cudaDeviceMapHost); cudaHostAlloc( (void **)&a, bytes, cudaHostAllocMapped); cudaHostGetDevicePointer((void **)&d_a, (void *)a, 0); texture<float, 2, cudaR...

OpenCL or CUDA Which way to go?

I'm investigating ways of using GPU in order to process streaming data. I had two choices but couldn't decide which way to go? My criterias are as follows: Ease of use (good API) Community and Documentation Performance Future I'll code in C and C++ under linux. ...

cuda device selection with multiple cpu threads.

Hello. Can you tell me how cuda runtime chooses GPU device if 2 or more host threads use cuda runtime? does the runtime choose separate GPU devices for each thread? does GPU device needs to be set explicitly? Thanks ...

How can I modify pointers passed as part of a variable argument list?

I have a function which takes a variable number of pointers, which I would like to modify. It looks something like: void myPointerModifyingFunction (int num_args, ... ) { void *gpu_pointer; char mem_type; va_list vl; va_start(vl,num_args); for (int i=0;i<num_args;i++) { gpu_pointer=va_arg(vl,void*); ...

How do CUDA devices handle immediate operands?

Compiling CUDA code with immediate (integer) operands, are they held in the instruction stream, or are they placed into memory? Specifically I'm thinking about 24 or 32 bit unsigned integer operands. I haven't been able to find information about this in any of the CUDA documentation I've examined so far. So references to any documents o...

CUDA 3.0 and cmake and emulation mode

I'm trying to use CUDA with cmake (v 2.8) on my Mac (OSX 10.6). So far it works fine, I created a small sample just to try it out (see below). However when I switch on emulation mode, it cannot invoke the CUDA kernel anymore and I get the following error message: Cuda error: kernel invocation: invalid device function . I also tried to ...

Exclusive compute mode with OpenCL+NVidia

Hi, I have a question to exclusive compute mode with NVidia+OpenCL. I can set up exclusive compute mode (page 74 from cuda programming guide 3.0) with nvidia-smi on a nvidia-gpu . that means, only one program can compute on gpu. cuda runtime schedules than app automatically. but I have a problem with opencl-programs in this case: if o...