I'm working on getting a CUDA application to also monitor the GPU's core temp. That information is accessible via NVAPI.
A problem is that I want to make sure I'm monitoring the same GPU as I'm running my code on.
However, there seems to be information suggesting that the device IDs I get from NvAPI_EnumPhysicalGPUs does not correspond...
I am currently writing a matrix multiplication on a GPU and would like to debug my code, but since I can not use printf inside a device function, is there something else I can do to see what is going on inside that function. This my current function:
__global__ void MatrixMulKernel(Matrix Ad, Matrix Bd, Matrix Xd){
int tx = threadI...
I've noticed that the running times of my CUDA kernels are almost tripled the moment the screensaver kicks in. This happens even if it's the blank screensaver.
Oddly enough, this appears to have nothing to do with the power settings. When I disable the screen saver and let the screen power off, the performance stays the same. When I set...
I am attempting to write a simple particle system that leverages CUDA to do the updating of the particle positions. Right now I am defining a particle has an object with a position defined with three float values, and a velocity also defined with three float values. When updating the particles, I am adding a constant value to the Y com...
I need to compute the nullspace of several thousand small matrices (8x9, not 4x3 as I wrote previously) in parallel (CUDA). All references point to SVD but the algorithm in numerical recipes seems very expensive, and gives me lots of things other than the null space that I don't really need. Is Gaussian elimination really not an option...
hi i just wanted to know whether it is possible to do the following inside the nvidia cuda kernel
__global__ void compute(long *c1, long size, ...)
{
...
long d[1000];
...
}
or the following
__global__ void compute(long *c1, long size, ...)
{
...
long d[size];
...
}
...
I am C++ programmer that develop image and video algorithims, should i learn Nvidia CUDA? or it is one of these technlogies that will disappear?
...
I am attempting to build a particle system utilizing CUDA to do the heavy lifting. I want to randomize some the particles initial values like velocity and life span. The random numbers don't have to be super random since its just for visual effect. I found this post that addresses the same subject
http://stackoverflow.com/questions/8...
I am somewhat familiar with the CUDA visual profiler and the occupancy spreadsheet, although I am probably not leveraging them as well as I could. Profiling & optimizing CUDA code is not like profiling & optimizing code that runs on a CPU. So I am hoping to learn from your experiences about how to get the most out of my code.
There was...
i am having some troubles understanding threads in NVIDIA gpu architecture with cuda.
please could anybody clarify these info:
an 8800 gpu has 16 SMs with 8 SPs each. so we have 128 SPs.
i was viewing stanford's video presentation and it was saying that every SP is capable of running 96 threads cuncurrently. does this mean that it (SP)...
I would like to map a thread_id. This in C/CUDA but it is more an algebraic problem that I am trying to solve.
So the mapping I am trying to achieve is along the lines:
Threads 0-15: read value array[0]
Threads 16-31: read value [3]
Threads 32-47: read value [0]
Threads 48-63: read value [3]
Threads 64-79: read value array[6]
Thread...
hi i was running cuda program on a machine which has cpu with four cores, how is it possible to change cuda c program to use all four cores and all gpu's available? i mean my program also does things on host side before computing on gpus'...
thanks!
...
i wrote a cuda program and i am testing it in emulation mode since i don't have a cuda supported NVIDIA card.
so my question is, do you know any server that i can ssh or telnet to, and has a cuda compiler on it?
...
i wrote a cuda program and i am testing it on ubuntu as a virtual machine. the reason for this is i have windows 7, i don't want to install ubuntu as a secondary operating system, and i need to use a linux operating system for testing.
my question is: will the virtual machine limit the gpu resources? So will my cuda code be faster if i r...
I have a kernel which uses 17 registers, reducing it to 16 would bring me 100% occupancy. My question is: are there methods that can be used to reduce the number or registers used, excluding completely rewriting my algorithms in a different manner. I have always kind of assumed the compiler is a lot smarter than I am, so for example I o...
As far as my understanding goes, shared memory is divided into banks and accesses by multiple threads to a single data element within the same bank will cause a conflict (or broadcast).
At the moment I allocate a fairly large array which conceptually represents several pairs of two matrices:
__shared__ float A[34*N]
Where N is the nu...
We have some nightly build machines that have the cuda libraries installed, but which do not have a cuda-capable GPU installed. These machines are capable of building cuda-enabled programs, but they are not capable of running these programs.
In our automated nightly build process, our cmake scripts use the cmake command
find_package(C...
I have a particle system where the positions and various properties are stored in a vertex buffer object. The values are continuously updated by a CUDA kernel. Presently I am just rendering them using GL_POINTS as flat circles. What I am interested in is rendering these particles are more involved things like 3d animated bird models f...
i am currently doing my BS in computer science and i am interested in graduate studies. i realize that most universities ask for student research experience and publications. i am very interested in cuda programming. so my question is: how can i write papers about cuda. i searched a lot on Google and did not find a lot of research papers...
Hi,
I write matlab program(cuda) for generate key.
how to optimize cuda program for get better performance?
...