cuda timer question | ansaurus

tags:

cuda

views:

29

answers:

2

+1 Q:

cuda timer question

say I want to time a memory fetching from device global memory

cudaMemcpy(...cudaMemcpyHostToDevice);
cudaThreadSynchronize();
time1 ...

kernel_call();
cudaThreadSynchronize();
time2 ...

cudaMemcpy(...cudaMemcpyDeviceToHost);
cudaThreadSynchronize();
time3 ...

I don't understand why my time3 and time2 always give same results. My kernel does take a long time to get the result ready for fetching, but shouldn't cudaThreadSynchronize() block all the operation before kernel_call is done? Also fetching from device memory to host memory shall also take a while, at least noticeable. Thanks.

+1 A:

The best way to monitor the execution time is to use the CUDA_PROFILE_LOG=1 environment variable, and set in the CUDA_PROFILE_CONFIG file the values, timestamp, gpustarttimestamp,gpuendtimestamp. after running your cuda program with those environment variable a local .cuda_log file should be created and listed inside the timing amounts of memcopies and kernel execution to the microsecond level. clean and not invasive .

fabrizioM 2010-10-29 23:42:27

related questions

CUDA vs Direct X 10 for parallel mathematics. any thoughs you have about it ?

How to design an approximate solution algorithm

CUDA compiler (nvcc) macro

CUDA + Visual Studio = suppressed output window

How do you get around the maximum CUDA run-time?

How ugly is the API for GP-GPU?

Compression library using Nvidia's CUDA

CUDA vs FPGA?

CUDA: Wrapping device memory allocation in C++

CUDA memory troubles

Dynamic Allocation of Constant memory in CUDA

Getting array subsets efficiently

How to block until an asynchronous job finishes

CUDA Driver API vs. CUDA runtime

CUDA for .net?

Should I create CUDA apps now, or wait for DirectX 11?

Operations on arbitrary value types

How do I make an already written concurrent program run on a GPU array?

GPGPU VM's: Any open source projects to port virtual machines onto graphics processing units?

Turning C# methods into C++ methods

CUDA global (as in C) dynamic arrays allocated to device memory

Have you successfully used a GPGPU?

How well do common programming tasks translate to GPUs?

raytracing with CUDA

Feasability of GPU as a CPU?