I suspect that you have some misconceptions about CUDA and how to use it, especially since you refer to a "process" when, in CUDA terminology, there is no such thing.
For most CUDA applications, there are two important things to getting good performance: optimizing memory access and making sure each 'active' CUDA thread in a warp performs the same operation at the same time as otehr active threads in the warp. Both of these sound like they are important for your application.
To optimize your memory access, you want to make sure that your reads from global memory and your writes to global memory are coalesced. You can read more about this in the CUDA programming guide, but it essentially means, adjacent threads in a half warp must read from or write to adjacent memory locations. Also, each thread should read or write 4, 8 or 16 bytes at a time.
If your memory access pattern is random, then you might need to consider using texture memory. When you need to refer to memory that has been read by other threads in a block, then you should make use of shared memory.
In your case, I'm not sure what your input data is, but you should at least make sure that your writes are coalesced. You will probably have to invest some non-trivial amount of effort to get your reads to work efficiently.
For the second part, I would recommend that each CUDA thread process one pixel in your output image. With this strategy, you should watch out for loops in your kernels that will execute longer or shorter depending on the per-thread data. Each thread in your warps should perform the same number of steps in the same order. The only exception to this is that there is no real performance penalty for having some threads in a warp perform no operation while the remaining threads perform the same operation together.
Thus, I would recommend having each thread check if its pixel is inside a given triangle. If not, it should do nothing. If it is, it should compute the output color for that pixel.
Also, I'd strongly recommend reading more about CUDA as it seems like you are jumping into the deep end without having a good understanding of some of the basic fundamentals.