Hello,
I already asked this question in the nvidia forum but never got an answer link.
Every time I try to step into a kernel I get a similar error message to this:
__device_stub__Z10bitreversePj (__par0=0x110000) at
/tmp/tmpxft_00005d4b_00000000-1_bitreverse.cudafe1.stub.c:10
10 /tmp/tmpxft_00005d4b_00000000-1_bitreverse.cudafe1....
I have a cuda program like this :
for (int i=0;i<100000;i++) {
if (i%2 == 0) {
bind_x(x) // bind x to texture
kernel_code<<A,B>>(M,x,y) // calculate y = M*x
}
else {
bind_x(y)
kernel_code<<A,B>>(M,y,x) // calculate x = M*y
}
cudaThreadSynchronize();
if (i%2 == 0)
unbind_x(x)
else
unbind_x(y) // u...
Whats the relationship between max work group size and warp size? lets say my device has 240 cuda streaming processors(SP) and returns the following info -
CL_DEVICE_MAX_COMPUTE_UNITS: 30
CL_DEVICE_MAX_WORK_ITEM_SIZES: 512 / 512 / 64
CL_DEVICE_MAX_WORK_GROUP_SIZE: 512
CL_NV_DEVICE_WARP_SIZE: 32
this means it has 8 SPs per Streaming...
I've been using CUDA for the past couple of months on a 64 bit windows 7 installation along with Visual Studio 2008. Recently i shifted to a 32 bit windows 7 installation and also updated my graphics card, which earlier was a 8600GTX and now is a GTX465. I've installed the relevant driver and the CUDA 3.1 toolkit, and am still using VS20...
I just started experimenting cuda with the following cude
#include "macro.hpp"
#include <algorithm>
#include <iostream>
#include <cstdlib>
//#define double float
//#define double int
int RandomNumber(){return static_cast<double>(rand() % 1000);}
__global__ void sum3(double const* a,
double const* b,
double c...
Does CUDA support recursion?
...
Is it possible to run different threads on different multiprocessors? similar to CPU cores?
Suppose I have 2 large arrays a, b and I want to compute both sum and difference. Lets say I have 2 multiprocessors on my device. Is it possible to run both kernel functions (which compute sum and difference) concurrently on 2 different multiproc...
I want to use assembly code in CUDA C code
in order to reduce expensive executions
as we do using asm in c programming.
I've googled for that but nothing has been found.
Is it possible?
...
Compiling a CUDA enabled version of aircrack-ng that hasn't been bug-fixed in a while so needed a bit of patching to get most of the way there.
Basically, the make cannot find the relevant compiler (nvcc) for this one section of code;
Relevent Makefile section
ifeq ($(CUDA), true)
CFLAGS += -DCUDA_ENABLED
NVCC := $(CUDA_BIN)/nvcc
IN...
I got an "unaligned memory accesses not supported error" and did a Google search for that
but there were no clear explanations.
The whole error message is:
/c:\cuda\include\math_functions_dbl_ptx1.h(1308): Error: Unaligned memory accesses not supported
The following code caused the error:
for (j = low; j <= high; j++)
The variables...
I'm trying to compile my CUDA project with CMake 2.8.2.
My SDK is located in "/Developed/GPU Computing/" (OSX). The problem is the whitespace in the path, thus CMake doesn't find the libs.
I tried:
link_libraries("-L${CUDA_SDK_ROOT_DIR}/lib -lcutil")
Result:
i686-apple-darwin10-g++-4.2.1: Computing/C/lib: No such file or directory
Doe...
When I try to build my project on a 64 bit Windows 7 using VS 2010 in Debug 64 bit configuration I get this error along with two other errors.
error: linkage specification is incompatible with previous "hypot" in math.h line 161
error: linkage specification is incompatible with previous "hypotf" in math.h line 161
error: function "abs(l...
When I try to build my project on a 64 bit Windows 7 using VS 2010 in Debug 64 bit configuration I get this error along with two other errors.
error: linkage specification is incompatible with previous "hypot" in math.h line 161
error: linkage specification is incompatible with previous "hypotf" in math.h line 161
error: function "abs(l...
This is driving me crazy. I've looked all over, but I'm not sure I understand exactly what's causing this error.
I'm making a call to a DLL (that I've coded as a separate project) which runs a CUDA kernel on some data I'm using. Although, I suspect the issue isn't being caused by CUDA, since the code has been tested and works at least ...
I have a class A that I overload its operator=. However it is required that I need to do something like this:
volatile A x;
A y;
x = y;
which raised an error while compiling
error: no operator "=" matches these operands
operand types are: volatile A = A
If I removed volatile, it's compilable. Is there anyway to have this com...
I've got a Nvidia Tesla s2050; a host with a nvidia quadro card.CentOS 5.5 with CUDA 3.1
When i run cuda app, i wanna use 4 Tesla c-2050, but not including quadro on host in order not to lagging the whole performance while split the job by 5 equally.any way to implement this?
...
There are ways of using cuda:
auto-paralleing tools such as PGI workstation;
wrapper such as Thrust(in STL style)
NVidia GPUSDK(runtime/driver API)
Which one is better for performance or learning curve or other factors?
Any suggestion?
...
hi
with a very simple code, hello world, the breakpoint is not working.
I can't write the exact comment since it's not written in English,
but it's like 'the symbols of this document are not loaded' or something.
there's not cuda codes, just only one line printf in main function.
The working environment is windows7 64bit, vc++2008 sp1...
What would happen if there are four concurrent CUDA Applications competing for resources in one single GPU
so they can offload the work to the graphic card?. The Cuda Programming Guide 3.1 mentions that there
are certain methods which are asynchronous:
Kernel launches
Device device memory copies
Host device memory copies of a memory...
I'm writing some code for activating neural networks on CUDA, and I'm running into an issue. I'm not getting the correct summation of the weights going into a given neuron.
So here is the kernel code, and I'll try to explain it a bit clearer with the variables.
__global__ void kernelSumWeights(float* sumArray, float* weightArray, int...