I'm working on a C# library which offloads certain work tasks to the GPU using NVIDIA's CUDA. An example of this is adding two arrays together using extension methods:
float[] a = new float[]{ ... }
float[] b = new float[]{ ... }
float[] c = a.Add(b);
The work in this code is done on the GPU. However, I would like it to be done asynchronously such that only when the result is needed will the code running on the CPU block (if the result is not finished on the GPU yet). To do this I've created an ExecutionResult class which hides the asynchronous execution. In use this looks as follows:
float[] a = new float[]{ ... }
float[] b = new float[]{ ... }
ExecutionResult res = a.Add(b);
float[] c = res; //Implicit converter
At the last line the program blocks if the data is done ready yet. I'm not certain of the best way to implement this blocking behavior inside the ExecutionResult class as I'm not very experienced with synchronizing threads and those sorts of things.
public class ExecutionResult<T>
{
private T[] result;
private long computed = 0;
internal ExecutionResult(T[] a, T[] b, Action<T[], T[], Action<T[]>> f)
{
f(a, b, UpdateData); //Asych call - 'UpdateData' is the callback method
}
internal void UpdateData(T[] data)
{
if (Interlocked.Read(ref computed) == 0)
{
result = data;
Interlocked.Exchange(ref computed, 1);
}
}
public static implicit operator T[](ExecutionResult<T> r)
{
//This is obviously a stupid way to do it
while (Interlocked.Read(ref r.computed) == 0)
{
Thread.Sleep(1);
}
return result;
}
}
The Action passed to the constructor is an asynchronous method which performs the actual work on the GPU. The nested Action is the asynchronous callback method.
My main concern is how to best/most elegantly handle the waiting done in the converter but also if there are more appropriate ways to attack the problem as a whole. Just leave a comment if there is something I need to elaborate or explain further.