cuda

Debuging CUDA kernels called from .NET code in VS2008, emulation mode

CUDA has an option to compile code in emulation mode, which is supported in the .rules file they provide. I have C# .NET 3.5 SP1 code that calls a native dll, using DllImport, the native dll is compiled via VS2008 using nvcc and its function is to transfer memory from and to CUDA and to invoke CUDA kernels. When the CUDA kernels are co...

CUDA driver installation on a laptop with nVidia NVS140M card

I'm trying to first figure out if my computer contains a CUDA-enabled card. It has an nVidia NVS 140M card, but I can't seem to figure out if it is the 128 MB version or 256 MB version. On the laptop purchase receipt, I found out that I ordered the 128 MB version, but the control panel description of the card said otherwise as shown belo...

CUBLAS or supported libraries, and emphasis for reading for a beginner

I'm trying to harness the power of the GPU (nVidia Quadro NVS140M) to speed up some matrix computations in my project. I'm reading through some documentation (programming guide, best practices guide, and reference manual), but not sure which section(s) I should focus on. It would be great if I can receive some advices on this. Also, I'm...

Using CUDA in an existing MFC project

I have an existing MFC application with matrix computation with CPU-optimized BLAS libraries. I'm interested in adding CuBLAS computational functionalities to my project, but I have the two following questions: 1) I'm not sure if I would need to do something on specifying my own CUDA kernel, thread, and block configurations at this poin...

Visual Studio, Intel Visual Fortran, and Visual C/C++ mixed-language compile

Working with Visual Studio 2008 Pro, with Intel Fortran compiler v11, on Windows 7 x64. I have an Intel Visual Fortran project set up with all the fortran source files. I wish to gradually replace all these subroutines with C/C++ (actually cuda -- bonus points). Simply right clicking on source files in the solution explorer and "add exi...

CUDA: What is scattered write?

Various CUDA demos in the CUDA SDK refer to "scattered write". What is this scattered write and why is it so great? In contrast to what does it stand? ...

XCode and CUDA integration

Hi, Was just wondering if anyone has any experience working with CUDA and XCode? I'm having a nightmare setting it all up... Dawson ...

CUBLAS memory allocation error

I tried to allocate 17338896 elements of floating point numbers as follows (which is roughly 70 mb): state = cublasAlloc(theSim->Ndim*theSim->Ndim, sizeof(*(theSim->K0)), (void**)&K0cuda); if(state != CUBLAS_STATUS_SUCCESS) { printf("Error allocation video memory.\n"); return -1; ...

CUDA: What reasons could there be for nvcc taking several minutes to compile?

I have some CUDA code that nvcc (well, technically ptxas) likes to take upwards of 10 minutes to compile. While it isn't small, it certainly isn't huge. (~5000 lines). The delay seems to come and go between CUDA version updates, but previously it only took a minute or so instead of 10. When I used the -v option, it seemed to get st...

How should a very simple Makefile look like for Cuda compiling under linux

Hi, I want to compile a very basic hello world level Cuda program under Linux. I have three files: the kernel: helloWorld.cu main method: helloWorld.cpp common header: helloWorld.h Could you write me a simple Makefile to compile this with nvcc and g++? Thanks, Gabor ...

convert u_int64_t to u_char on CUDA 2.3 nvopencc

CUDA 2.3 V0.2.1221 / 32bit linux Hi, I have a problem with the following code: __device__ void put_u64(void *vp, u_int64_t v) { u_char *p = (u_char *) vp; p[0] = (u_char) (v >> 56) & 0xff; p[1] = (u_char) (v >> 48) & 0xff; p[2] = (u_char) (v >> 40) & 0xff; p[3] = (u_char) (v >> 32) & 0xff; p[4] = (u_char) (v >> 24) & 0xff; p[5] = (u_c...

Passing pointers between C and Java through JNI

At the moment, i'm trying to create a Java-application which uses CUDA-functionality. The connection between CUDA and Java works fine, but i've got another problem and wanted to ask, if my thoughts about it are correct. When i call a native function from Java, i pass some data to it, the functions calculates something and returns a resu...

CUDA: synchronizing threads

Almost anywhere I read about programming with CUDA there is a mention of the importance that all of the threads in a wrap do the same thing. In my code I have a situation where I can't avoid a certain condition. It looks like this: // some math code, calculating d1, d2 if (d1 < 0.5) { buffer[x1] += 1; // buffer is in the global mem...

Using CUDA Kernels

I'm interested in using CUSP library for CUDA (available here). However, I'm either having trouble getting this library to work with my application linking with CUDA and/or CUBLAS static libraries. I'm assuming from glancing through the header and source files that I either use the kernels by building the related files as a static librar...

CUDA: documentation of kernel CRT?

I'm trying to find the documentation for all of the functions available for the CUDA kernels. The CUDA Reference manual seem to include only the host functions and the CUDA programming guide only includes some details such as the accuracy of these functions but not their documentation. Am I missing something or does this piece of docu...

CUDA vs. CuBlas memory management

I have noticed that I can use memory blocks for matrices either allocated using cudamalloc() or cublasalloc() function to call cublas functions. The matrix transfer rates and computational are slower for arrays allocated using cudamalloc() rather than cublasalloc(), although there are other advantages to using arrays using cudamalloc(). ...

CUDA host to device (or device to host) memcpy operations with application rendering graphics using OpenGL on the same graphics card

I have posted my problem in the CUDA forums, but not sure if it's appropriate to post a link here for more ideas in case there are significant number of different audiences between the two forums. The link is here. I apologize for any inconvenience and appreciate any comments on this question, as I haven't heard back yet on some specific...

CUDA, find out the number of registers in the kernel at runtime

hello how can I find out the number of registers cuda kernel is using during run time? I know how to find out information during the compilation, but I do not want to hardcode numbers in thanks ...

CUDA Memory Allocation accessible for both host and device

I'm trying to figure out a way to allocate a block of memory that is accessible by both the host (CPU) and device (GPU). Other than using cudaHostAlloc() function to allocate page-locked memory that is accessible to both the CPU and GPU, are there any other ways of allocating such blocks of memory? Thanks in advance for your comments. ...

CUDA on non-nVidia card hardware

My laptop doesn't have any nVidia graphic cards. I want to work on CUDA. The website says that CUDA can be worked in emulation mode on non-cuda hardware too. But when I tried installing CUDA drivers downloaded from their website, it gives an error "The nvidia setup couldn't locate any drivers that are compatible with yoour current hardwa...