views:

187

answers:

3

I'm writing a parser in PHP which must be able to handle large in-memory strings, so this is a somewhat important issue. (ie, please don't "premature optimize" flame me, please)

How does the substr function work? Does it make a second copy of the string data in memory, or does it reference the original? Should I worry about calling, for example, $str = substr($str, 1); in a loop?

A: 

Yes, you should be careful doing any string manipulation inside a loop as new copies of the string will be generated on each iteration.

Andrew Hare
I'm not a php guy, so you might simply say nope. In Java, it only creates a new reference to the same immutable char array. So while it creates a new String object, it doesn't take store more copies of the underlying char array. It merely declares the offsets to be different. Does php actually create a new copy of the char array? Or only reference the same char array.
glowcoder
+1  A: 

If you're really looking into efficiency, you will need to keep a pointer - I mean index - with your string. Many string functions accept an offset to start operating from (like strpos()'s third parameter). Normally I would recommend writing an object to wrap this functionality, but if you're expecting to use that a lot, that might cause a performance bottleneck. Here is an example of what I mean (without OO):

while ($whatever) {
    $pos = strpos($string, $myToken, $startIndex);
    # do something using $pos
    $startIndex = $pos;
}

If you want, you can write your own wrapper class that does these string operations and see if it has a speed impact:

class _String {
    private $string;
    private $startIndex;
    private $length;
    public function __construct($string) {
        $this->string = $string;
        $this->startIndex = 0;
        $this->length = strlen($string);
    }
    public function substr($from, $length = NULL) {
        $this->startIndex = $from;
        if ($length !== NULL) {
            $this->endIndex = $from + $length;
        }
    }
    # other functions you might use
    # ...
}
soulmerge
+1  A: 

To further Chad's comment, your code would require both strings (the full one, and the full-one-minus-first-character) to be in memory at the same time (though not due to the assignment as Chad stated). See:

$string = str_repeat('x', 1048576);
printf("MEM:  %d\nPEAK: %d\n", memory_get_usage(), memory_get_peak_usage());

substr($string, 1);
printf("MEM:  %d\nPEAK: %d  :-(\n", memory_get_usage(), memory_get_peak_usage());

$string = substr($string, 1);
printf("MEM:  %d\nPEAK: %d  :-(\n", memory_get_usage(), memory_get_peak_usage());

Outputs something like (memory values are in bytes):

MEM:  1093256
PEAK: 1093488
MEM:  1093280
PEAK: 2142116  :-(
MEM:  1093276
PEAK: 2142116  :-(
salathe