views:

269

answers:

3
list(,$nfields) = unpack ( "N*", substr ( $response, $p, 4 ) ); $p += 4;

The question is, why "N*" if substr should return 4 bytes, and they will be unpacked as N? And why double assignment?

UPD: This code is part of Sphinx native PHP connector. After some code hacking it became clear that this code extracts 4-byte integer. But logic behind double assignment and substr / N* is still unclear to me. I'm offering a bounty to finally understand it.

A: 

The code is probably a bug. This kind of loop is precisely the reason why * exists...

Victor Nicollet
This code works :) And I need to understand _how_. Should `substr` cut off exactly 4 bytes? What with multibyte strings?
Kuroki Kaze
`substr` is not multibyte-aware, `mb_substr` is.
Victor Nicollet
@Victor: That's why `substr` cuts 4 bytes, not 4 mb chars.
Alix Axel
@Alix Axel: that was precisely my point :)
Victor Nicollet
+1  A: 

We'd need to see the revision history of the file but some possibilities are:

  1. These are the remains of a previous algorithm that was progressively stripped of functionality but never cleaned up.
  2. It's the typical spaghetti code we all produce after a bad night.
  3. It's an optimization that speeds up the code for large input strings.

These are all synonyms:

<?php

$packed = pack('N*', 100, 200, 300);

// 1
var_dump( unpack('N*', $packed) );

// 2
var_dump( unpack('N*', substr($packed, 0, 4)) );
var_dump( unpack('N*', substr($packed, 4, 4)) );
var_dump( unpack('N*', substr($packed, 8, 4)) );

// 3
var_dump( unpack('N', substr($packed, 0, 4)) );
var_dump( unpack('N', substr($packed, 4, 4)) );
var_dump( unpack('N', substr($packed, 8, 4)) );

?>

I did the typical repeat-a-thousand-times benchmark with three integers and 1 is way faster. However, a similar test with 10,000 integers shows that 1 is the slowest :-!

0.82868695259094 seconds
0.0046610832214355 seconds
0.0029149055480957 seconds

Being a full-text engine where performance is a must, I'd dare say it's an optimization.

Álvaro G. Vicario
Thanks, I think this is the answer.
Kuroki Kaze
A: 

unpack ( "N*", substr ( $response, $p, 4 ) );

Specifies the format to use when unpacking the data from substr()

N - unsigned long, always 32 bit, big endian byte order

Cortopasta