cuda

Strange "No such file or directory" error in cuda-gdb

Hello, I already asked this question in the nvidia forum but never got an answer link. Every time I try to step into a kernel I get a similar error message to this: __device_stub__Z10bitreversePj (__par0=0x110000) at /tmp/tmpxft_00005d4b_00000000-1_bitreverse.cudafe1.stub.c:10 10 /tmp/tmpxft_00005d4b_00000000-1_bitreverse.cudafe1....

cudaThreadSynchronize() requirement

I have a cuda program like this : for (int i=0;i<100000;i++) { if (i%2 == 0) { bind_x(x) // bind x to texture kernel_code<<A,B>>(M,x,y) // calculate y = M*x } else { bind_x(y) kernel_code<<A,B>>(M,y,x) // calculate x = M*y } cudaThreadSynchronize(); if (i%2 == 0) unbind_x(x) else unbind_x(y) // u...

cuda model - what is warp size?

Whats the relationship between max work group size and warp size? lets say my device has 240 cuda streaming processors(SP) and returns the following info - CL_DEVICE_MAX_COMPUTE_UNITS: 30 CL_DEVICE_MAX_WORK_ITEM_SIZES: 512 / 512 / 64 CL_DEVICE_MAX_WORK_GROUP_SIZE: 512 CL_NV_DEVICE_WARP_SIZE: 32 this means it has 8 SPs per Streaming...

Unable to open cpp1.ii file while compiling CUDA project

I've been using CUDA for the past couple of months on a 64 bit windows 7 installation along with Visual Studio 2008. Recently i shifted to a 32 bit windows 7 installation and also updated my graphics card, which earlier was a 8600GTX and now is a GTX465. I've installed the relevant driver and the CUDA 3.1 toolkit, and am still using VS20...

Numerical Error in simple CUDA code

I just started experimenting cuda with the following cude #include "macro.hpp" #include <algorithm> #include <iostream> #include <cstdlib> //#define double float //#define double int int RandomNumber(){return static_cast<double>(rand() % 1000);} __global__ void sum3(double const* a, double const* b, double c...

Does CUDA support recursion?

Does CUDA support recursion? ...

Different threads on different multiprocessors

Is it possible to run different threads on different multiprocessors? similar to CPU cores? Suppose I have 2 large arrays a, b and I want to compute both sum and difference. Lets say I have 2 multiprocessors on my device. Is it possible to run both kernel functions (which compute sum and difference) concurrently on 2 different multiproc...

Is it possible to put instructions into CUDA code?

I want to use assembly code in CUDA C code in order to reduce expensive executions as we do using asm in c programming. I've googled for that but nothing has been found. Is it possible? ...

Makefile variable substitution sometimes ignored

Compiling a CUDA enabled version of aircrack-ng that hasn't been bug-fixed in a while so needed a bit of patching to get most of the way there. Basically, the make cannot find the relevant compiler (nvcc) for this one section of code; Relevent Makefile section ifeq ($(CUDA), true) CFLAGS += -DCUDA_ENABLED NVCC := $(CUDA_BIN)/nvcc IN...

Why do I have an "unaligned memory accesses not supported" error?

I got an "unaligned memory accesses not supported error" and did a Google search for that but there were no clear explanations. The whole error message is: /c:\cuda\include\math_functions_dbl_ptx1.h(1308): Error: Unaligned memory accesses not supported The following code caused the error: for (j = low; j <= high; j++) The variables...

CMake + Link error + Whitespace in path

I'm trying to compile my CUDA project with CMake 2.8.2. My SDK is located in "/Developed/GPU Computing/" (OSX). The problem is the whitespace in the path, thus CMake doesn't find the libs. I tried: link_libraries("-L${CUDA_SDK_ROOT_DIR}/lib -lcutil") Result: i686-apple-darwin10-g++-4.2.1: Computing/C/lib: No such file or directory Doe...

VS2010 compiler and cuda error: linkage specification is incompatible with previous "hypot"

When I try to build my project on a 64 bit Windows 7 using VS 2010 in Debug 64 bit configuration I get this error along with two other errors. error: linkage specification is incompatible with previous "hypot" in math.h line 161 error: linkage specification is incompatible with previous "hypotf" in math.h line 161 error: function "abs(l...

VS2010 compiler and cuda error: linkage specification is incompatible with previous “hypot”

When I try to build my project on a 64 bit Windows 7 using VS 2010 in Debug 64 bit configuration I get this error along with two other errors. error: linkage specification is incompatible with previous "hypot" in math.h line 161 error: linkage specification is incompatible with previous "hypotf" in math.h line 161 error: function "abs(l...

C# P/Invoke on CUDA DLL eventually causes AccessViolationException

This is driving me crazy. I've looked all over, but I'm not sure I understand exactly what's causing this error. I'm making a call to a DLL (that I've coded as a separate project) which runs a CUDA kernel on some data I'm using. Although, I suspect the issue isn't being caused by CUDA, since the code has been tested and works at least ...

C++ volatile and operator overloading

I have a class A that I overload its operator=. However it is required that I need to do something like this: volatile A x; A y; x = y; which raised an error while compiling error: no operator "=" matches these operands operand types are: volatile A = A If I removed volatile, it's compilable. Is there anyway to have this com...

cuda app on part of the cards

I've got a Nvidia Tesla s2050; a host with a nvidia quadro card.CentOS 5.5 with CUDA 3.1 When i run cuda app, i wanna use 4 Tesla c-2050, but not including quadro on host in order not to lagging the whole performance while split the job by 5 equally.any way to implement this? ...

best way of using cuda

There are ways of using cuda: auto-paralleing tools such as PGI workstation; wrapper such as Thrust(in STL style) NVidia GPUSDK(runtime/driver API) Which one is better for performance or learning curve or other factors? Any suggestion? ...

breakpoints in cuda do not work!

hi with a very simple code, hello world, the breakpoint is not working. I can't write the exact comment since it's not written in English, but it's like 'the symbols of this document are not loaded' or something. there's not cuda codes, just only one line printf in main function. The working environment is windows7 64bit, vc++2008 sp1...

Concurrency, 4 CUDA Applications competing to get GPU resources

What would happen if there are four concurrent CUDA Applications competing for resources in one single GPU so they can offload the work to the graphic card?. The Cuda Programming Guide 3.1 mentions that there are certain methods which are asynchronous: Kernel launches Device device memory copies Host device memory copies of a memory...

Coding a CUDA Kernel that has many threads writing to the same index?

I'm writing some code for activating neural networks on CUDA, and I'm running into an issue. I'm not getting the correct summation of the weights going into a given neuron. So here is the kernel code, and I'll try to explain it a bit clearer with the variables. __global__ void kernelSumWeights(float* sumArray, float* weightArray, int...