tags:

views:

2515

answers:

6

I have been asked recently to produced the MIPS (million of instructions per second) for an algorithm we have developed. The algorithm is exposed by a set of C-style functions. We have exercise the code on a Dell Axim to benchmark the performance under different input.

This question came from our hardware vendor, but I am mostly a HL software developer so I am not sure how to respond to the request. Maybe someone with similar HW/SW background can help...

  1. Since our algorithm is not real time, I don't think we need to quantify it as MIPS. Is it possible to simply quote the total number of assembly instructions?

  2. If 1 is true, how do you do this (ie. how to measure the number of assembly instructions) either in general or specifically for ARM/XScale?

  3. Can 2 be performed on a WM device or via the Device Emulator provided in VS2005?

  4. Can 3 be automated?

Thanks a lot for your help. Charles


Thanks for all your help. I think S.Lott hit the nail. And as a follow up, I now have more questions.

5 Any suggestion on how to go about measuring MIPS? I heard some one suggest running our algorithm and comparing it against Dhrystone/Whetstone benchmark to calculate MIS.

6 Since the algorithm does not need to be run in real time, is MIPS really a useful measure? (eg. factorial(N)) What are other ways to quantity the processing requirements? (I have already measured the runtime performance but it was not a satisfactory answer.)

7 Finally, I assume MIPS is a crude estimate and would be dep. on compiler, optimization settings, etc?

+1  A: 

MIPS are a measure of CPU speed, not algorithm performance. I can only assume the somewhere along the line, someone is slightly confused. What are they trying to find out? The only likely scenario I can think of is they're trying to help you determine how fast a processor they need to give you to run your program satisfactorily.

Since you can measure an algorithm in number of instructions (which is no doubt going to depend on the input data, so this is non-trivial), you then need some measure of time in order to get MIPS -- for instance, say "I need to invoke it 1000 times per second". If your algorithm is 1000 instructions for that particular case, you'll end up with:

1000 instructions / (1/1000) seconds = 1000000 instructions per second = 1 MIPS.

I still think that's a really odd way to try to do things, so you may want to ask for clarification. As for your specific questions, I'll leave that to someone more familiar with Visual Studio.

rmeador
A: 
e.James
Thanks aJames, I have done what you suggested already (on the Dell Axim), but the average_time was not a satisfactory answer. Thus the request for MIPS.
Charles
Hmm, fair enough. Gotta love vague requests from management!
e.James
+3  A: 

I'll bet that your hardware vendor is asking how many MIPS you need.

As in "Do you need a 1,000 MIPS processor or a 2,000 MIPS processor?"

Which gets translated by management into "How many MIPS?"

Hardware offers MIPS. Software consumes MIPS.

You have two degrees of freedom.

  • The processor's inherent MIPS offering.

  • The number of seconds during which you consume that many MIPS.

If the processor doesn't have enough MIPS, your algorithm will be "slow".

if the processor has enough MIPS, your algorithm will be "fast".

I put "fast" and "slow" in quotes because you need to have a performance requirement to determine "fast enough to meet the performance requirement" or "too slow to meet the performance requirement."

On a 2,000 MIPS processor, you might take an acceptable 2 seconds. But on a 1,000 MIPS processor this explodes to an unacceptable 4 seconds.


How many MIPS do you need?

  1. Get the official MIPS for your processor. See http://en.wikipedia.org/wiki/Instructions_per_second

  2. Run your algorithm on some data.

  3. Measure the exact run time. Average a bunch of samples to reduce uncertainty.

  4. Report. 3 seconds on a 750 MIPS processor is -- well -- 3 seconds at 750 MIPS. MIPS is a rate. Time is time. Distance is the product of rate * time. 3 seconds at 750 MIPS is 750*3 million instructions.

Remember Rate (in Instructions per second) * Time (in seconds) gives you Instructions.

Don't say that it's 3*750 MIPS. It isn't; it's 2250 Million Instructions.

S.Lott
Thanks S.LottI think you are right. Any suggestion on how to go about measuring MIPS?Since the algorithm does not need to be run in real time, is MIPS really useful? (eg. factorial(N))Finally, I assume MIPS is a crude estimate and would be dep. on compiler, optimization settings, etc?
Charles
A: 

For a first estimate a benchmark on the PC may be useful.

However, before you commit to a specific device and clock frequency you should get a developer board (or some PDA?) for the ARM target architecture and benchmark it there.

There are a lot of factors influencing the speed on today's machines (caching, pipelines, different instruction sets, ...) so your benchmarks on a PC may be way off w.r.t. the ARM.

starblue
+1  A: 

Also remember that different compilers and compiler options make a HUGE difference. The same source code can run at many different speeds. So instead of buying the 2mips processor you may be able to use the 1/2mips processor and use a compiler option. Or spend the money on a better compiler and use the cheaper processor.

Benchmarking is flawed at best. As a hobby I used to compile the same dhrystone (and whetstone) code on various compilers from various vendors for the same hardware and the numbers were all over the place, orders of magnitude. Same source code same processor, dhrystone didnt mean a thing, not useful as a baseline. What matters in benchmarking is how fast does YOUR algorithm run, it had better be as fast or faster than it needs to. Depending on how close to the finish line you are allow for plenty of slop. Early on on probably want to be running 5 or 10 or 100 times faster than you need to so that by the end of the project you are at least slightly faster than you need to be.

I agree with what I think S. Lott is saying, this is all sales and marketing and management talk. Being the one that management has put between a rock and the hard place then what you need to do is get them to buy the fastest processor and best tools that they are willing to spend based on the colorful pie charts and graphs that you are going to generate from thin air as justification. If near the end of the road it doesnt quite meet performance, then you could return to stackoverflow, but at the same time management will be forced to buy a different toolchain at almost any price or swap processors and respin the board. By then you should know how close to the target you are, we need 1.0 and we are at 1.25 if we buy the processor that is twice as fast as the one we bought we should make it.

Whether or not you can automate these kinds of things or simulate them depends on the tools, sometimes yes, sometimes no. I am not familiar with the tools you are talking about so I cant speak to them directly.

dwelch
A: 

Some notes:

  1. MIPS is often used as a general "capacity" measure for processors, especially in the soft real-time/embedded field where you do want to ensure that you do not overload a processor with work. Note that this IS instructions per second, as the time is very important!

  2. MIPS used in this fashion is quite unscientific.

  3. MIPS used in this fashion is still often the best approximation there is for sizing a system and determining the speed of the processor. It might well be off by 25%, but never mind...

  4. Counting MIPS requires a processor that is close to what you are using. The right instruction set is obviously crucial, to capture the actual instruction stream from the actual compiler in use.

You cannot in any way approximate this on a PC. You need to bring out one of a few tools to do this right:

  1. Use an instruction-set simulator for the target archicture such as Qemu, ARM's own tools, Synopsys, CoWare, Virtutech, or VaST. These are fast but can count instructions pretty well, and will support the right instruction set. Barring extensive use of expensive instructions like integer divide (and please no floating point), these numbers tend to be usefully close.

  2. Find a clock-cycle accurate simulator for your target processor (or something close), which will give pretty good estimate of pipeline effects etc. Once again, get it from ARM or from Carbon SoCDesigner.

  3. Get a development board for the processor family you are targeting, or an ARM close to it design, and profile the application there. You don't use an ARM9 to profile for an ARM11, but an ARM11 might be a good approximation for an ARM Cortex-A8/A9 for example.

jakobengblom2