nvidia

total number of threads on nvidia Tesla

What is the total number of threads that can run concurrently on an nvidia Tesla, say S1070. ...

Performance differences between different CUDA SDK's?

If I want to re-write my application so that it leverages the power of nVidia's CUDA SDK, are there any differences at all in runtime performance between the different SDK offerings: C++, Java, Python? Is there any difference at all between these 3 SDK's, besides the obvious language being used? ...

How to install Nvidia Parallel NSight (Nexus) for VS2010 without having installed VS2008?

Is there a way to install Parallel NSight and use it with Visual Studio 2010 without having VS2008 SP1 installed? The setup checks if VS2008 is installed and won't continue if not. I know there is no official support for VS2010, but I found on a forum a small application that can integrate Nexus into VS2010 and it seems to work. Thanks i...

NVIDIA CUDA SDK Examples Compilation Unsupported Architecture 'compute_20'

On compilation of the CUDA SDK, I'm getting a nvcc fatal : Unsupported gpu architecture 'compute_20' My toolkit is 2.3 and on a shared system (i.e cant really upgrade) and the driver version is also 2.3, running on 4 Tesla C1060s If it helps, the problem is being called in radixsort. It appears that a few people online have had this ...

Java 3D performance poor after latest NVIDIA driver update to 257.21

I Hava a GeForce 9088GTX+ 512MB card and updated my driver from 191.07 to 257.21. Now When I run a Java3D application i made, it runs really slowly (seems like it's not using hardware acceleration since my cpu usage goes up for the System process). This same application worked flawlessly on the old driver version. Now it's sometimes n...

Are atomic operations on global memory in CUDA performed in parallel across a warp ?

Hi I need to do an atomic FP add operation on global memory on a CC 2.0 device. If the global data referenced in a warp fit into an aligned 128-byte sector, will these operations be done in parallel or will they be executed one at a time? My guess would be that they are parallel, but I am not sure of this Regards Gautham Ganapathy ...

glFramebufferTexture2D fails with GL_INVALID_VALUE

I'm seeing a certain call to glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, tex, 0) regularly (not always though) fail with GL_INVALID_VALUE on NVidia (Quadro FX 4600 here, but happens also on other cards) hardware, with the newest (258.49) drivers (but similarly this also happened with older drivers). GPU m...

Timeout in CUDA? / fermi / gtx465

I am using CUDA SDK 3.1 on MS VS2005 with GPU GTX465 1 GB. I have such a kernel function: __global__ void CRT_GPU_2(float *A, float *X, float *Y, float *Z, float *pIntensity, float *firstTime, float *pointsNumber) { int holo_x = blockIdx.x*20 + threadIdx.x; int holo_y = blockIdx.y*20 + threadIdx.y; float k=2.0f*3.14f/0.00000005...

CUDA Basic Matrix Addition - Large Matrices

Hi all, I'm trying to add two 4800x9600 matrices, but am running into difficulties... It's a simple C=A+B operation... Here is the kernel: __global__ void matAdd_kernel(float* result,float* A,float* B,int size) { int x=blockIdx.x*blockDim.x+threadIdx.x; int y=blockIdx.y*blockDim.y+threadIdx.y; int idx=x*y+x; ...

CUDA Add Rows of a Matrix

Hi, I'm trying to add the rows of a 4800x9600 matrix together, resulting in a matrix 1x9600. What I've done is split the 4800x9600 into 9,600 matrices of length 4800 each. I then perform a reduction on the 4800 elements. The trouble is, this is really slow... Anyone got any suggestions? Basically, I'm trying to implement MATLAB's su...

glBufferData fails silently with overly large sizes

hi everybody, i just noticed, that glBufferData fails silently when i try to call it with size: 1085859108 and data: NULL. Following calls to glBufferSubData fail with a OUT_OF_MEMORY 'Exception'. This is on Windows XP 32bit, NVIDIA Gforce 9500 GT (1024MB) and 195.62 Drivers. Is there any way to determinate if a buffer was created suc...

OpenGL without X.org in linux

I'd like to open an OpenGL context without X in linux. Is there any way at all to do it? I know it's bossible for integrated intel graphics card hardware, though most people have nvidia cards in their system. I'd like to get a solution that works with nvidia cards. If there's no other way than through integrated intel hardware, I guess...

How to investigate client side WSAECONNABORTED happening very often only on machines with NVDIA Quadro?

We have a C++ client/server application in which the client retrieves and renders 3D content from a server. Our client disconnects from the server very often (more than 50% of runs after less than 1 minute) with recv failing and WSAGetLastError returning WSAECONNABORTED. But the strange thing is that this happens only when: the client...

Streaming multiprocessors, Blocks and Threads (CUDA)

What is the relationship between a CUDA core, a streaming multiprocessor and the CUDA model of blocks and threads? What gets mapped to what and what is parallelized and how? and what is more efficient, maximize the number of blocks or the number of threads? Thanks, ExtremeCoder My current understanding is that there are 8 cuda core...

What is the right process to get compatibility or at least a workaround for the 'Threaded optimization' feature of NVIDIA?

It's peculiar this issue is not well understood on NVIDIA forums and project forums. For example, the well known ioquake3 project based on id tech 3 requires to force 'Threaded optimization' off on the NVIDIA settings or there are severe FPS drops. Do you know what a programmer has to do to acquire compatibility with the feature or at ...

cuda app on part of the cards

I've got a Nvidia Tesla s2050; a host with a nvidia quadro card.CentOS 5.5 with CUDA 3.1 When i run cuda app, i wanna use 4 Tesla c-2050, but not including quadro on host in order not to lagging the whole performance while split the job by 5 equally.any way to implement this? ...

Concurrency, 4 CUDA Applications competing to get GPU resources

What would happen if there are four concurrent CUDA Applications competing for resources in one single GPU so they can offload the work to the graphic card?. The Cuda Programming Guide 3.1 mentions that there are certain methods which are asynchronous: Kernel launches Device device memory copies Host device memory copies of a memory...

C++ Nvidia Cg question

Hello! I started using Nvidia Cg shaders recently and everything looks and works fine if I'm doing it on the Nvidia GPU (GTS250 in my case). I tried launching the same (my own test application) on ATI HD4650 and saw no output. Right after that I started experimenting with test examples (provided with Nvidia Cg 3.0) and 6/7 work, but th...

GPGPU, OpenCL, CUDA, ATI Stream

Hello, everyone! Please tell me what technologies GPGPU exist already and which hardwares vendor's implement GPGPU? I've been reading articles on various sites from morning and I've become confused. ...

MotherBoard for Nvidia 480 GTX With Cuda

I am going to use cuda to develop programs on GPUs. My plan is to place 3 Nvidia 480 on a single motherboard... is this possible? if yes, then what motherboard do you recommend? ...