views:

618

answers:

4

Seeking suggestions from PHP architects!

I'm not terribly familiar with PHP but have taken over maintenance of a large analytics package written in the language. The architecture is designed to read reported data into large key/value arrays, which are passed through various parsing modules to extract those report parameters known to each of those modules. Known parameters are removed from the master array, and any leftovers which were not recognized by any of the modules, are dumped into a kind of catch-all report showing the "unknown" data points.

There are a few different methods being used to call these parser modules, and I would like to know which if any are considered to be "proper" PHP structure. Some are using pass-by-reference, others pass-by-value, some are functions, some are objects. All of them modify the input parameter in some way.

A super-simplified example follows:

#!/usr/bin/php
<?php

$values = Array("a"=>1, "b"=>2, "c"=>3, "d"=>4 );


class ParserA {
    private $a = null;
    public function __construct(&$myvalues) {
        $this->a = $myvalues["a"];
        unset($myvalues["a"]);
    }
    public function toString() { return $this->a; }
}

// pass-by-value
function parse_b($myvalues) {
    $b = $myvalues["b"];
    unset($myvalues["b"]);
    return Array($b, $myvalues);
}

// pass-by-reference
function parse_c(&$myvalues) {
    echo "c=".$myvalues["c"]."\n";
    unset($myvalues["c"]);
}

// Show beginning state
print_r($values);

// will echo "1" and remove "a" from $values
$a = new ParserA($values);
echo "a=".$a->toString()."\n";
print_r($values);

// w ill echo "2" and remove "b" from $values
list($b, $values) = parse_b($values);
echo "b=".$b."\n";
print_r($values);

// will echo "3" and remove "c" from $values
parse_c($values);
print_r($values);

?>

The output will be:

Array
(
    [a] => 1
    [b] => 2
    [c] => 3
    [d] => 4
)
a=1
Array
(
    [b] => 2
    [c] => 3
    [d] => 4
)
b=2
Array
(
    [c] => 3
    [d] => 4
)
c=3
Array
(
    [d] => 4
)

I'm really uncomfortable having so many different call methods in use, some of which have hidden effects on the call function parameters using "&pointer"-style functions, some requiring the main body to write their output, and some writing their output independently.

I would prefer to choose a single methodology and stick with it. In order to do so, I would also like to know which is most efficient; my reading of the PHP documentation indicates that since it uses copy-on-write, there shouldn't be much performance difference between using pointers to vs passing the object directly and re-reading a return value. I would also prefer to use the object-oriented structure, but am uncomfortable with the hidden changes being made to the input parameter on the constructor.

Of the three calling methods, ParserA(), parse_b(), and parse_c(), which if any is the most appropriate style?

+2  A: 

I'm not really an expert in PHP but from my experience passing by value is better. This way code won't have side effects and that mean it will be easier to understand and maintain and do all sorts of crazy things on it, like using it as callback for map function. So I'm all for parse_b way of doing things.

vava
A: 

FYI: In PHP, objects are always passed by reference, no matter what. Also if you have an array with objects and scalar values in it, the scalar values are passed by value, but the objects by reference.

Jacob Kiers
I'm not sure it is true that they are always passed by reference. When using "unset" on a parameter value's content 'unset($myvalues["a"])', the "a" value is not removed from the $values array in the parent scope, only from the $myvalues array in the function scope. That means it passed by value, not by reference. Maybe my test was wrong?
ryandenki
It is true, the reference is only broken (a copy of the original variable is created) when its value changes. So unset($myvalues["a"]) causes the following steps: create copy of '$myvalues', remove element with key 'a' from '$myvalues'
Jacco
`$myvalues` isn't an object in that example, its an array (which gets passed by "copy on write" value)
gnarf
@ryandenki: I was specifically talking about objects, not scalar values. Just like gnarf correctly pointed out.
Jacob Kiers
A: 

As a general rule in PHP, do not use references unless you really have to. references in PHP are also not what most people expect them to be:

"References in PHP are a means to access the same variable content by different names. They are not like C pointers; instead, they are symbol table aliases.""

see also: php.net: What References Are

So in short:
The proper way of handling this PHP is using creating an object that passes the variables around by value or manipulating the array with array_map (array_map allows you to apply a callback function to the elements an array.)

Jacco
A: 

I would vote against the methods proposed in general, but of them, I think parse_b has the best idea.

I think it would be better design to wrap the "data" array in a class that could let you "pop" a key out of it easily. So the parser ends up looking like:

class ParserA {
  private $a = null;
  public function __construct(My_Data_Class $data) {
    $this->a = $data->popValue("a");
  }
  public function toString() { return $this->a; }
}

And a sample implementation

class My_Data_Class {
  protected $_data;
  public function __construct(array $data) {
    $this->_data = $data;
  }
  public function popValue($key) {
    if (isset($this->_data[$key])) {
       $value = $this->_data[$key];
       unset($this->_data[$key]);
       return $value;
    }
  }
}
gnarf