Here are some numbers from my machine (it is Smalltalk/X, but I guess the numbers are comparable - at least the ratios should be):
The called methods "foo" and "foo:" are a noops (i.e. consist of a ^self):
self foo ... 3.2 ns
self perform:#foo ... 3.3 ns
[self foo] value ... 12.5 ns (2 sends and 2 contexts)
[ ] value ... 3.1 ns (empty block)
Compiler valuate:('TestClass foo') ... 1.15 ms
self foo:123 ... 3.3 ns
self perform:#foo: with:123 ... 3.6 ns
[self foo:123] value ... 15 ns (2 sends and 2 contexts)
[self foo:arg] value:123 ... 23 ns (2 sends and 2 contexts)
Compiler valuate:('TestClass foo:123') ... 1.16 ms
Notice the big difference between "perform:" and "evaluate:"; evaluate is calling the compiler to parse the string, generate a throw-away method (bytecode), execute it (it is jitted on the first call) and finally discarded. The compiler is actually written to be used mainly for the IDE and to fileIn code from external streams; it has code for error reporting, warning messages etc.
In general, eval is not what you want when performance is critical.
Timings from a Dell Vostro; your milage may vary, but the ratios not.
I tried to get the net execution times, by measuring the empty loop time and subtracting;
also, I ran the tests 10 times and took the best times, to eliminate OS/network/disk/email or whatever disturbances. However, I did not really care for a load-free machine.
The measure code was (replaced the second timesRepeat-arg with the stuff above):
callFoo2
|t1 t2|
t1 :=
TimeDuration toRun:[
100000000 timesRepeat:[]
].
t2 :=
TimeDuration toRun:[
100000000 timesRepeat:[self foo:123]
].
Transcript showCR:t2-t1
EDIT:
PS: I forgot to mention: these are the times from within the IDE (i.e. bytecode-jitted execution). Statically compiled code (using the stc-compiler) will generally be a bit faster (20-30%) on these low-level micro benchmarks, due to a better register allocation algorithm.
EDIT: I tried to reproduce these numbers the other day, but got completely different results (8ns for the simple call, but 9ns for the perform). So be very careful with these micro-timings, as they run completely out of the first-level cache (and empty messages even omit the context setup, or get inlined) - they are usually not very representative of the overall performance.