kernel1 <<< blocks1, threads1, 0, stream1 >>> ( args ... );
...
kernel2 <<< blocks2, threads2, 0, stream2 >>> ( args ... );
...
I have two kernels to run concurrently,
and the device is GTX460, so it's Fermi architecture.
The cuda toolkit and sdk are 3.2 rc.
Like codes above, two kernels are coded to be run concurrently,
but there are no responses from any kernel.
Is there any constraints on what kernels are doing?
Two kernels share some data
and they have some part in common.
If I comment out most of one kernel function, then program halts.
Please give me any help.