CUDA different results on different platforms | ansaurus

tags:

cuda

views:

68

answers:

1

+2 Q:

CUDA different results on different platforms

I've written a small CUDA program on my macbook pro and now tried it out on my Linux box and get different results.

In order to ensure correctness, I wrote unit tests: An array of floats, which contains the values to check, is copied to the device and then back. Worst thing is that it sometimes returns different values on Linux (and very strange ones), but on my Mac it runs correctly every time.

I use CUDA 3.1 on both platforms on the mac however I have to compile it 32bit, because 64bit CUDA is not yet supported. The Linux machine is and x64 with Ubuntu 10.04 (gcc is 4.3.4) on the Mac the gcc version is i686-apple-darwin10-gcc-4.2.1.

The GPUs are on the Mac GeForce 9600M GT (Compute capability 1.1) and on the PC GeForce GTX 285 or a Telsa C1060 (Compute capability 1.3)

I've done a few more check and ensured that the data is read in completely, but so far I could not identify the problem any ideas how to figure out what is causing the trouble?

Update I couldn't reproduce everything, but this example sometimes prints out just zeros and sometimes the correct results.. why?

#include <stdio.h>

__device__ void testFunc(float *ptr)
{
    *ptr = 3.4;
}

__global__ void testkernel(float* validation_data, int n)
{
    for(int i=0; i<100; i++)
        validation_data[i] = 666;

    float *ptr;
    testFunc(ptr);
    validation_data[0] = *ptr;
}

int main()
{  
    int n = 100;
    float *validation_data = (float*)malloc(sizeof(float)*100);
    float *validation_data_d;

    cudaMalloc((void**)&validation_data_d, sizeof(float)*n);

    testkernel <<<1,1>>> (validation_data_d, n);

    // Copy the array back again.
    cudaMemcpy(validation_data, validation_data_d, sizeof(float)*n,
        cudaMemcpyDeviceToHost);

    for(int i=0; i<n; i++)
        printf("%f ", validation_data[i]);
    printf("\n");
}

+3 A:

This is undefined behavior. You're dereferencing an undefined pointer.

float *ptr;
testFunc(ptr);

You could do the following instead:

__device__ void testFunc(float &val)
{
    val = 3.4;
}

...
        float val;
        testFunc(val);
        validation_data[0] = val;
...

sharth 2010-08-24 12:54:39

Missed that, thanks!

Nils 2010-08-24 13:37:25

related questions

CUDA vs Direct X 10 for parallel mathematics. any thoughs you have about it ?

How to design an approximate solution algorithm

CUDA compiler (nvcc) macro

CUDA + Visual Studio = suppressed output window

How do you get around the maximum CUDA run-time?

How ugly is the API for GP-GPU?

Compression library using Nvidia's CUDA

CUDA vs FPGA?

CUDA: Wrapping device memory allocation in C++

CUDA memory troubles

Dynamic Allocation of Constant memory in CUDA

Getting array subsets efficiently

How to block until an asynchronous job finishes

CUDA Driver API vs. CUDA runtime

CUDA for .net?

Should I create CUDA apps now, or wait for DirectX 11?

Operations on arbitrary value types

How do I make an already written concurrent program run on a GPU array?

GPGPU VM's: Any open source projects to port virtual machines onto graphics processing units?

Turning C# methods into C++ methods

CUDA global (as in C) dynamic arrays allocated to device memory

Have you successfully used a GPGPU?

How well do common programming tasks translate to GPUs?

raytracing with CUDA

Feasability of GPU as a CPU?