You have two fundamental approaches.
1) Detemine the data flow dependencies between chunks of code. Find small chunks apparantly-sequential-in-the-source that have no data dependencies. Insert parallelism calls on those chunks, if the parallelism management is estimated to be less than the work the chunks do.
2) When 1) doesn't provide enough, replace the algorithm in use with one which does have identifiable paralellelism.
To do 1), you need full C/C++ front ends that can compute data flow dependencies. You need to be able to estimate the cost of the potentially possible parallel chunks. And you need to able to transform the code to insert the parallelism primitives.
For 2), you need to be able to recognize algorithms (e.g., the classic triply-nested loop for a sequential matrix multiply), and apply transforms to replace it with other code.
Our DMS Software Reengineeering Toolkit is a program transformation engine capable of being configured to do these tasks. It has full C and C++ front ends; the C front end has complete data flow anlaysis and points-to analyzers and thus can compute data dependencies directly. The C++ front presently doesn't have all the flow anlaysis machinery connected, but that machinery is generic and can be connected with some (considerable) effort.
The program transformation aspect of DMS can be use to modify the compiler data structures that are constructed by the front ends, and then those structures can be regenerated back into compilable text.
So, the bits and pieces are avialable. Its still a fair bit of work to implement a specific parallelizer, and there is considerable question as to whether existing C/C++ codes parallelize well if you have one.