There is the single bottleneck here, running the process per each row. Increasing "EngineThreads" would not help at all, as there will be only one thread running this particular script transform anyway. The time spent in other transforms probably does not matter at all. Processes are heavy weight objects, and running thousands of them will never be cheap.
I can think of following ideas to make it better:
1) The best way to fix it is to convert your custom EXE into an assembly and call it from the script transform - to avoid the overhead of creating processes, parsing the output etc.
2) If you have to use the separate processes, you can try to run these processes in parallel. It will help if the process mostly waits for some input/output (i.e. it is I/O bound). If the processes are memory bound or CPU bound, you would not win much by running them in parallel.
2A) Complex script, simple package.
To run them in parallel, modify the ProcessInput method in your script to start the process asynchronously, and don't wait for the process completion - move to the next row and create the next process. Subscribe to process output and process Exited event, so you know when it has finished. Limit the number of processes run in parallel - otherwise you'll run out of memory. Wait until all the processes are done before returning from ProcessInput call.
2B) Simple script, complex package.
Keep the current sequential script, but partition the data using SSIS. Add conditional split transform, and split the input stream into multiple streams, based on some hash expression - something that will make each output to receive approximately the same amount of data. The number of streams equals the number of process instances you want to run in parallel. Add your script transform to each output of conditional split. Now you should also increase "Engine Threads" property :) and these transforms will run in parallel. (Note: based on tag, I assume you use SSIS 2008. You'll need to insert additional Union All transforms to make it work in SSIS 2005).
This should make it perform better, but millions of processes is a lot. You'll hardly get really good performance here.