At peak theoretical maximum performance on a 3GHz CPU (x86), you would have a budget of just under 4 CPU instructions per random byte, if you were trying to hit 200MB/s. Real-world performance is going to be considerably less than this. I'd posit that this is going to be extremely difficult in any language. You're well into the kinds of speeds which tend to employ dedicated hardware accelerator (i.e. you're attempting to do 1.56Gbit per second). In networking or video applications, there is a considerable amount of external hardware dedicated to permitting this kind of throughput. An extremely efficient implementation in C or assembly might allow you to hit this constraint, but you're really hitting the limits of what is possible using just general-purpose hardware.
I would consider either pre-generating the data (as has already been suggested) or employing some kind of hardware crypto-accelerator in order to hit anything resembling these kinds of throughputs. I found this list of crypto accelerator hardware vendors.
As a final thought, you did in fact mean 200 mega bytes per second, right? If you meant mega bits then this problem is much more easily solvable.