views:

298

answers:

7

Hi,

I am a PHP developer and I have always thought that micro-optimizations are not worth the time. If you really need that extra performance, you would either write your software so that it's architecturally faster, or you write a C++ extension to handle slow tasks (or better yet, compile the code using HipHop). However, today a work mate told me that there is a big difference in

is_array($array)

and

$array === (array) $array

and I was like "eh, that's a pointless comparison really", but he wouldn't agree with me.. and he is the best developer in our company and is taking charge of a website that does about 50 million SQL queries per day -- for instance. So, I am wondering here is that could he be wrong or micro-optimization really worth the time and when?

+26  A: 

Micro-optimisation is worth it when you have evidence that you're optimising a bottleneck.

Usually it's not worth it - write the most readable code you can, and use realistic benchmarks to check the performance. If and when you find you've got a bottleneck, micro-optimise just that bit of code (measuring as you go). Sometimes a small amount of micro-optimisation can make a huge difference.

But don't micro-optimise all your code... it will end up being far harder to maintain, and you'll quite possibly find you've either missed the real bottleneck, or that your micro-optimisations are harming performance instead of helping.

Jon Skeet
Rocketmonkeys
@Jon - Can you get back to writing books about C# etc that I love to read and leave the low hanging fruit to us mere mortals????
Peter M
@Peter: If it's any consolation, I'm currently going over the proofs of chapter 14 of C# in Depth. SO is just an occasional distraction :)
Jon Skeet
Oh no, he is going for the PHP badge, too!
Max
@Jon - you call a SO rep of 205K an **occasional** distraction????? But I do look forward to reading your next book. The original C# in Depth reminded me of Scott Meyers' C++ books which I liked a lot.
Peter M
@Jon: If you're saying anything in your book about profiling, I hope you're giving some thought to these issues: http://stackoverflow.com/questions/1777556/alternatives-to-gprof/1779343#1779343
Mike Dunlavey
@Mike: Nope, I say very little indeed about profiling in the book.
Jon Skeet
A: 

Well, I'm going to assume that is_array($array) is the prefered way, and $array === (array) $array is the alledgely faster way (which bring up the question why isn't is_array implemented using that comparison, but I digress).

I will hardly ever go back into my code and insert a micro-optimization(*), but I will often put them into the code as I write it, provided:

  • it doesn't slow my typing down.
  • the intent of the code is stil clear.

That particular optimization fails on both counts.

(*) OK, actually I do, but that has more to do with me having a touch of OCD rather than good development practises.

James Curran
Even though I'm not a PHP dev, and I know it's kind of besides the point of the actual question, but I'd appreciate someone (not necessarily James) commenting on why there's such a performance difference (assuming it's true) and the question that James brought up (why isn't `is_array()` implemented using the fast comparison?).
Michael Burr
@Michael: It will have to be someone besides me (I'm not a PHP dev either)
James Curran
@James: Understood; I tried to make the comment indicate that. Also I realize that this is really just an idle curiosity (as much as micro-optimizations may be evil, I'm still often curious about what's happening behind the scenes in various language constructs).
Michael Burr
As to the question you brought up, the answer is: the premise is false. It's not faster (in general).
Artefacto
A: 

Well, there's more things than speed to take into consideration. When you read that 'faster' alternative, do you instantly think "Oh, this is checking to see if the variable is an array", or do you think "...wtf"?

Because really - when considering this method, how often is it called? What is the exact speed benefit? Does this stack up when the array is larger or smaller? One cannot do optimizations without benchmarks.

Also, one shouldn't do optimizations if they reduce the readability of the code. In fact, reducing that amount of queries by a few hundred thousand (and this is often easier than one would think) or optimizing them if applicable would be much, much more beneficial to performance than this micro-optimization.

Also, don't be intimidated by the guy's experience, as others have said, and think for yourself.

Cthulhu
+13  A: 

Well, for a trivially small array, $array === (array) $array is significantly faster than is_array($array). On the order of over 7 times faster. But each call is only on the order of 1.0 x 10 ^ -6 seconds (0.000001 seconds). So unless you're calling it literally thousands of times, it's not going to be worth it. And if you are calling it thousands of times, I'd suggest you're doing something wrong...

The difference comes when you're dealing with a large array. Since $array === (array) $array requires a new variable to be copied, it'll likely be SIGNIFICANTLY slower for a large array. For example, on an array with 100 integer elements, is_array($array) is within a margin of error (< 2%) of is_array() with a small array (coming in at 0.0909 seconds for 10,000 iterations). But $array = (array) $array is extremely slow. For only 100 elements, it's already over twice as slow as is_array() (coming in at 0.203 seconds). For 1000 elements, in_array stayed the same, yet the cast comparison increased to 2.0699 seconds...

The reason it's faster for small arrays is that is_array() has the overhead of being a function call, where the cast operation is a simple language construct... And copying a small variable will typically be cheaper than the function call overhead. But, for larger variables, the difference grows...

It's a tradeoff. Extra memory instead of the overhead of a function call. But eventually, the extra memory will have more overhead than the function call...

I'd suggest going for readability though. I find is_array($array) to be far more readable than $array === (array) $array. So you get the best of both worlds.

The script I used for the benchmark:

$elements = 1000;
$iterations = 10000;

$array = array();
for ($i = 0; $i < $elements; $i++) $array[] = $i;

$s = microtime(true);
for ($i = 0; $i < $iterations; $i++) is_array($array);
$e = microtime(true);
echo "in_array completed in " . ($e - $s) ." Seconds\n";

$s = microtime(true);
for ($i = 0; $i < $iterations; $i++) $array === (array) $array;
$e = microtime(true);
echo "Cast completed in " . ($e - $s) ." Seconds\n";

Edit: For the record, these results were with 5.3.2 on Linux...

ircmaxell
+1. I guess that "best programmer" should be definitely presented with your answer and benchmark snippet.
FractalizeR
A: 

As the cliche goes, micro-optimization is generally worth the time only in the smallest, most performance-critical hotspots of your code, only after you've proven that's where the bottleneck is. However, I'd like to flesh this out a little, to point out some exceptions and areas of misunderstanding.

  1. This doesn't mean that performance should not be considered at all upfront. I define micro-optimization as optimizations based on low-level details of the compiler/interpreter, the hardware, etc. By definition, a micro-optimization does not affect big-O complexity. Macro-optimizations should be considered upfront, especially when they have a major impact on high-level design. For example, it's pretty safe to say that if you have a large, frequently accessed data structure, an O(N) linear search isn't going to cut it. Even things that are only constant terms but have a large and obvious overhead might be worth considering upfront. Two big examples are excessive memory allocation/data copying and computing the same thing twice when you could be computing it once and storing/reusing the result.

  2. If you're doing something that's been done before in a slightly different context, there may be some bottlenecks that are so well-known that it's reasonable to consider them upfront. For example, I was recently working on an implementation of the FFT (fast Fourier Transform) algorithm for the D standard library. Since so many FFTs have been written in other languages before, it's very well-known that the biggest bottleneck is cache performance, so I went into the project immediately thinking about how to optimize this.

dsimcha
A: 

In general you should not write any optimisation which makes your code more ugly or harder to understand; in my book this definitely falls into this category.

It is much harder to go back and change old code than write new code, because you have to do regression testing. So in general, no code already in production should be changed for frivolous reasons.

PHP is such an incredibly inefficient language that if you have performance problems, you should probably look to refactor hot spots so they execute less PHP code anyway.

So I'd say in general no, and in this case no, and in cases where you absolutely need it AND have measured that it makes a provable difference AND is the quickest win (low-hanging fruit), yes.

Certainly scattering micro-optimisations like this throughout your existing, working, tested code is a terrible thing to do, it will definitely introduce regressions and almost certainly make no noticable difference.

MarkR
A: 

Is micro-optimization worth the time?

No, unless it is.

In other words, a-priori, the answer is "no", but after you know a specific line of code consumes a healthy percent of clock time, then and only then is it worth optimizing.

In other words, profile first, because otherwise you don't have that knowledge. This is the method I rely on, regardless of language or OS.

Added: When many programmers discuss performance, from experts on down, they tend to talk about "where" the program spends its time. There is a sneaky ambiguity in that "where" that leads them away from the things that could save the most time, namely, function call sites. After all, the "call Main" at the top of an app is a "place" that the program is almost never "at", but is responsible for 100% of the time. Now you're not going to get rid of "call Main", but there are nearly always other calls that you can get rid of. While the program is opening or closing a file, or formatting some data into a line of text, or waiting for a socket connection, or "new"-ing a chunk of memory, or passing a notification throughout a large data structure, it is spending great amounts of time in calls to functions, but is that "where" it is? Anyway, those calls are quickly found with stack samples.

Mike Dunlavey