tags:

views:

139

answers:

6

I was reading about how to detect the encoding of a file in PHP, and on some blog or somewhere, it was suggested to do this:

if (function_exists('mb_detect_encoding')) {
    function is_utf8($str) {
        // do stuff using the mb* libraries
    }
} else {
    function is_utf8($str) {
        // do stuff manually
    }
}

To me this feels very messy, and could be replaced with this:

function is_utf8($str) {
    if (...) {
        // mb stuff here
    } else {
        // manual stuff here
    }
}

However, I can also see that it also has some benefits. Depending how complicated the if statement is, and how often the function is called, this might be much more efficient. My question is this: At what point would you consider splitting a function into two like in the first example? Are there any other pros/cons that I've missed?

Edit: Please don't get hung up on the example here, the question is about this practice in general.

+4  A: 

My gut reaction is to go with a single declaration of the is_utf8 function. The PHP engine is mighty good at optimization and the overhead of multiple calls to function_exists() should be negligible.

pygorex1
See my benchmarks below. The function_exists() call is pretty negligible, but something about either that or the control structure makes the function calling function_exists() take a noticeable amount more time.
Ed Carrel
Internally PHP probably does a lookup like `function_exists()` to check before calling *any* function. With the second method PHP has to do two lookups - one for `blarg()` and another for `sin()` which may account for additional time.
pygorex1
A: 

I'll apologize in advance for what's most likely pretty ugly PHP, but I'd do something more like:

if (function_exists('mb_detect_encoding')) {
    function is_utf8_mb($str) {
        // do stuff using the mb* libraries
    }
}
function is_utf8_manual($str) {
    // do stuff manually
}

if (function_exists('is_utf8_mb')) {
    function is_utf8($str) {
        is_utf8_mb($str)
    }
} else {
    function is_utf8($str) {
        is_utf8_manual($str)
    }
}

That is: an is_utf8 variant will exist in the execution environment dependent only on if it can work in that environment. Each variant is in a code block of the minimal size needed to determine if that variant should be loaded, and to perform its function. The main reason is that I think the smaller the amount of space you need to write your function, the easier it is for a reader to build an understanding of the function's behavior. Your eyes have to travel less, you have to scroll less, there are generally fewer of those trivial frustrations involved in understanding a function for the first time. The second reason is that it provides a better way of testing your code - you can more easily automate a check that the is_utf8 variant routines are producing the same result when you can access all of them at the same time.

Aidan Cully
+2  A: 

Stylistically, I'm inclined to push for the second. If performance is an issue though, you'd do well to consider using the first.

    if (function_exists("sin")) {
        function baz($farfs) {
            print "I am the first baz";
        }
    } else {
        function baz($farfs) {
            print "I am the second baz";
        }
    }

    function blarg($fgar) {
        if (function_exists("sin")) {
            print "I am the first blarg";
        } else {
            print "I am the second blarg";
        }
    }

    for ($i=0;$i

I ran this on my workstation here, and found that the aggregate time of calls to baz took anywhere from 50-75% of the aggregate calls to blarg, depending on whether the function being tested actually exists.

Here are the actual numbers:

  • function_exists("foobar")
    • blarg: 36.64 ms
    • baz: 14.71 ms
  • function_exists("sin")
    • blarg: 35.24 ms
    • baz: 23.59 ms

The only difference between these two functions is the conditional and the two extra characters in the output. Interestingly enough, the 10001 calls to function_exists only take 0.18 and 0.11 ms respectively on the two tests. I'm wondering if there is some function call overhead not being accounted for in either profile.

As for style though, I really dislike the first. Defining a function by name in two different places seems like a shady thing to do, especially when relying on the oddness in PHP that makes function definitions not performed in global scope affect global scope anyway. On the other hand, my biases just might be showing, and the rest of the PHP community might not have a problem with relying on the PHP interpreter as a sort of preprocessor in this instance.

shrug Such are the matters of style.

Ed Carrel
+3  A: 

A good practice, maybe not in that case, but in general might involve the use of OOP (objects) with factory methods.

class my_utf8 
{
  // A static method that determine what object to create
  public static function factory() {
    if (function_exists('mb_detect_encoding')) {
      return new my_utf8_with_mb;
    } else {
      return new my_utf8_without_mb;
    }
  }
}

class my_utf8_with_mb extends my_utf8 {
    public function is_utf8($str) {
        // do stuff using the mb* libraries
    }
}

class my_utf8_without_mb extends my_utf8 {
    public function is_utf8($str) {
        // do stuff without the mb* libraries
    }
}

And in your application :

global $myUtfInstance;
// Call once
$myUtfInstance = my_utf8::factory();

// later and forever...
$myUtfInstance->is_utf8($str);

You could also involve a singleton pattern depending on what you do. It does not skip the IF but eliminate the need of a global variable.

class my_utf8 
{
  private static $instance;

  public static function factory() {
    if (function_exists('mb_detect_encoding')) {
      return new my_utf8_with_mb;
    } else {
      return new my_utf8_without_mb;
    }
  }


  public static function getInstance()
  {
    if (!self::$instance) {
      self::$instance = self::factory();
    }

    return self::$instance;
  } 
}

And in your code :

my_utf8::getInstance()->is_utf8($str);
Pmax
+1  A: 

By default, go always with the most readable solution. If that particular piece of code turns out to be a significant drag (measured, not assumed) on your application, then you optimize it.

Emilio M Bumachar
A: 

There are two aspects at play here: performance (don't pay for what you don't need), and readability.

As many wise posters say: don't bother about performance until it's proven to be a problem.

But concerning readability, in my opinion, a function should show as little layers of responsibility as possible. Functions that do exactly one thing are easiest understood.

Your question is actually about 'how should I mix two responsibilities':

  1. what: detect the presence of a library, and 'dispatch' to the correct code block
  2. how to use the library vs. how to do without.

That's why I would really create two 'layers': one layer dispatching to the appropriate function, and another layer containing 'code blocks', wrapped in a function with the proper name.

And then you still have the choice whether to do the dispatching explicitly, or to use PHP's possibility to declare functions on-the-fly.

// functionality: defines how to detect utf8 encoding 
//

function is_utf8_mb( $arg ) {
... // using the mb library
}

function is_utf8_bare( $arg ) {
... // hand coded
}

// dispatching layer: decide what library to use
//

// option 1: define on-the-fly
function define_utf8_function() {
   if( function_exists('mb_detect_encoding') ) {
     function is_utf8( $arg ) { return is_utf8_mb( $arg ); }
   } else {
     function is_utf8( $arg ) { return is_utf8_bare( $arg ); }
   }
}

// option 2: check the dispatching on each function call
function is_utf8_runtimedispatch( $arg ) {
  if( function_exists('mb_detect_encoding') ) 
    return is_utf8_mb( $arg );
  else 
    return is_utf8_bar( $arg );
}
xtofl