I have read many of the good questions and answers around multi-core programming how-tos etc. I am familiar with concurrency, IPC, MPI etc but what I need is advice on how to measure speed-up which will help in making a business case of spending the time to write such code. Please don't answer with "well run it with single-core code then multi-core code and figure out the difference". This is neither a scientific nor a reliable way to measure performance improvement. If you know of tools that will do some of the heavy lifting please mention them. Answers pertaining to methodology will be more fitting but listing tools is ok as well.
In Visual Studio 2010 Ultimate, there's a Concurrency visualizer that will show you how many cores your app is using (and how much of the CPU), and how much of that is wasted on sync. The rest is beneficial work. I believe that Intel offers a very similar tool, but I'm not entirely sure how it works.
It's pretty hard to measure an improvement before you've implemented something. There's going to be a certain amount of educated guesswork involved.
I'm presuming the business has already established that the app/website is slow and costing money. I'm also assuming you've already ruled out other obvious performance improvements (database round-trips, caching, web front-end payload, etc - without knowing anything about your app.)
My first step would be to add a few lines of stopwatch code around the slow code in question, and log the response times over a few thousand operations in a live environment. Compare the average figures you see to the response times that you want to achieve.
Then run a code profiling tool on that same code (e.g.dotTrace for .NET) to see where your code is spending most of it's time. Apply the percentage of time spent in parallelizable code to the average times from the stopwatch, and you'll get a good idea of whether it can be made faster. Obviously it's not a case of dividing that figure by the number of cores because there is synchronization overhead, and there will be other tasks running in the real-world. But this should give you a close enough estimate of whether it will be feasible.