Right now, I use a combination of Python and R for all of my data processing needs. However, some of my datasets are incredibly large and would benefit strongly from multithreaded processing.
For example, if there are two steps that each have to performed on a set of several millions of data points, I would like to be able to start the second step while the first step is still being run, using the part of the data that has already been processed through the first step.
From my understanding, neither Python nor R is the ideal language for this type of work (at least, I don't know how to implement it in either language). What would be the best language/implementation for this type of data processing?