I've written a small CUDA program on my macbook pro and now tried it out on my Linux box and get different results.
In order to ensure correctness, I wrote unit tests: An array of floats, which contains the values to check, is copied to the device and then back. Worst thing is that it sometimes returns different values on Linux (and very strange ones), but on my Mac it runs correctly every time.
I use CUDA 3.1 on both platforms on the mac however I have to compile it 32bit, because 64bit CUDA is not yet supported. The Linux machine is and x64 with Ubuntu 10.04 (gcc is 4.3.4) on the Mac the gcc version is i686-apple-darwin10-gcc-4.2.1.
The GPUs are on the Mac GeForce 9600M GT (Compute capability 1.1) and on the PC GeForce GTX 285 or a Telsa C1060 (Compute capability 1.3)
I've done a few more check and ensured that the data is read in completely, but so far I could not identify the problem any ideas how to figure out what is causing the trouble?
Update I couldn't reproduce everything, but this example sometimes prints out just zeros and sometimes the correct results.. why?
#include <stdio.h>
__device__ void testFunc(float *ptr)
{
*ptr = 3.4;
}
__global__ void testkernel(float* validation_data, int n)
{
for(int i=0; i<100; i++)
validation_data[i] = 666;
float *ptr;
testFunc(ptr);
validation_data[0] = *ptr;
}
int main()
{
int n = 100;
float *validation_data = (float*)malloc(sizeof(float)*100);
float *validation_data_d;
cudaMalloc((void**)&validation_data_d, sizeof(float)*n);
testkernel <<<1,1>>> (validation_data_d, n);
// Copy the array back again.
cudaMemcpy(validation_data, validation_data_d, sizeof(float)*n,
cudaMemcpyDeviceToHost);
for(int i=0; i<n; i++)
printf("%f ", validation_data[i]);
printf("\n");
}