cuda

What is with the Disable Profiling option button on the Compute Profiler?

Why is there a button for enabling and disabling profiling on the Compute Profiler? If I disable profiling, then I can't launch my application for profiling. So why does profiling need to be disabled at all? ...

Is it possible to compare more than two kernels executions at a time in Compute Prof (OpenCL/CUDA)

Is it possible to compare more than two kernels executions at a time in Compute Prof? ...

CUDA and web development

It seems apparent that each core of the GPU could allow for handling of a request, rather than one main processor (the system's CPU) handling all requests. On the surface, it seems like it is possible, perhaps with Templates in GPU + Redis database in GPU GDDR5? Is it possible and worthwhile? ...

Question about Compute Visual Profiler and number of blocks for profiling

On page 51 of the Compute Visual Profiler User Guide it states that: " Note that in case the number blocks in a kernel is less than or not a multiple of the number of multiprocessors the counters values across multiple runs will not be consistent. " Is that an inclusive or exc...

how to create makefile cuda so it executed in CPU to test CPU flops

hi...i'm trying to count the GPU and CPU FLOPS and i've got the source from http://norma.mbg.duth.gr/index.php?id=about:benchmarks:cuda_flops i renamed it to cudaflops.cu and compile it with this makefile ################################################################################ # # Build script for project # ####################...

Compiling CUDA with Visual Studio 2010

Hello all, I have used Visual Studio 2008 to compile and run CUDA applications before. I have switched to Visual Studio 2010 and Windows 7. I've been trying to get integration set up all morning, but haven't had complete success. I've downloaded the toolkit, installed Nsight, made sure the libraries/include/bin paths are set, checked ...

What is the best way to learn CUDA?

I have some knowledge of C/C++ programming and want to learn CUDA. I'm also on a mac. So what is the best way to learn CUDA? ...

How to use CUDA constant memory in a programmer pleasant way?

I'm working on a number crunching app using the CUDA framework. I have some static data that should be accessible to all threads, so I've put it in constant memory like this: __device__ __constant__ CaseParams deviceCaseParams; I use the call cudaMemcpyToSymbol to transfer these params from the host to the device: void copyMetaData(C...

how to dynamically create methods to operate on class objects initialized at runtime

I have a class, say class AddElement{ int a,b,c; } With methods to set/get a,b,c... My question is definitely a logic question - say I implement AddElement as follows: int Value=1; Value+=AddElement.get_a()+AddElement.get_b()+AddElement.get_b(); Now imagine I want to do the above except 'a,b,c' are now arrays, and instead of '...

How do I debug a CUDA library with only 1 graphics card running X11

I'm running a CUDA library that I need to debug for memory problems and other issues. But when I attach cuda-gdb to the process I get the error error: All CUDA devices are used for X11 and cannot be used while debugging. I understand the error, but there has to be a way that I can debug the issues. Since I only have 1 GPU, it real...

cuda timer question

say I want to time a memory fetching from device global memory cudaMemcpy(...cudaMemcpyHostToDevice); cudaThreadSynchronize(); time1 ... kernel_call(); cudaThreadSynchronize(); time2 ... cudaMemcpy(...cudaMemcpyDeviceToHost); cudaThreadSynchronize(); time3 ... I don't understand why my time3 and time2 always give same results. My ke...

CUDA: Debug with -deiceemu and gdb

Hello, I wrote a CUDA application that has some hardcoded parameters in it (via #defines). Everything seemed to work right, so I tried some other parameters. Now, the program doesn't work correctly anymore. So, I want to debug it. I compile the application with -deviceemu -g -O0 options, because I read that I can then use gdb to debug...

What type should I use for an index variable.

This is a best practices question. I am making an array type * x = malloc(size*sizeof(type)); AFAIK sizeof gives a return value of size_t. Does that mean that I should use a size_t to declare, or pass around size? Also when indexing the array should I also use a size_t for the index variable? What is the best practice for these th...

How to debug into CUDA kernel code using visual studio 2008?

Hey, I am using Visual Studio 2008, with CUDA 3.2. I am trying to debug into a function with this signature: MatrixMultiplication_Kernel<<<dimGrid, dimBlock>>>(Md, Nd, Pd, Width); I can step into the function, however when I get into the function it doesn't let me step over any of the code and tells me that no source is available. An...

Is there any possibility to write GPU-applications using CUDA under F sharp?

Hello, I am interested in using F# for numerical computation. How can I access the GPU using NVIDIA's CUDA standart under F#? ...

How can I run more than 1 kernels on one GPU with CUDA?

kernel1 <<< blocks1, threads1, 0, stream1 >>> ( args ... ); ... kernel2 <<< blocks2, threads2, 0, stream2 >>> ( args ... ); ... I have two kernels to run concurrently, and the device is GTX460, so it's Fermi architecture. The cuda toolkit and sdk are 3.2 rc. Like codes above, two kernels are coded to be run concurrently, but there ar...

Is there a way to document cuda's ".cu" file use doxygen

As the cuda's ".cu" file is basically c, Is there a way we can use doxygen to generate documentation for ".cu" files? I noticed that NVIDIA use doxygen to generate cuda's docuementation. However when I use doxygen, the ".cu" files are ignored. ...

MySQL implementation with CUDA

Hi, I am a senior undergrad majoring in CS. At the moment I am taking a Computer Architecture class. We need to do a project. I want to do something related to CUDA, where the performance of the computation will have a moderate increase compred to a serial implementation. I am really interested in databases so I decided to do something...

two nearly identical calls, one works one fails.

I have these template functions for use inline on device with cuda template <class T> __device__ inline T& cmin(T&a,T&b){return (a<b)?(a):(b);}; template <class T> __device__ inline T& cmax(T&a,T&b){return (a>b)?(a):(b);}; In the code I have cmin(z[i],y[j])-cmax(x[i],z[j]) for int arrays x,y,and z. I get the error: error: no ...

Emulation mode in CUDA 3.2 with VS2008

Hey guys, I am trying to debug into my kernel code, using the device emulation mode. However, I set break points in my kernel and it doesn't break. MatrixMultiplication_Kernel<<<dimGrid, dimBlock>>>(Md, Nd, Pd, Width); Can anyone assist me with this? ...