tags:

views:

571

answers:

8

What would be the most efficient way to select every nth item from a large array? Is there a 'smart' way to do it or is looping the only way?

Some points to consider:

  • The array is quite large with 130 000 items
  • I have to select every 205th item
  • The items are not numerically indexed, so for($i = 0; $i <= 130000; $i += 205) won't work

So far, this is the most efficient method I've come up with:

$result = array();
$i = 0;
foreach($source as $value) {

    if($i >= 205) {
        $i = 0;
    }

    if($i == 0) {
        $result[] = $value;
    }

    $i++;
}

Or the same with modulo:

$result = array();
$i = 0;
foreach($source as $value) {
    if($i % 205 == 0) {
        $result[] = $value;
    }
    $i++;
}

These methods can be quite slow, is there any way to improve? Or am I just splitting hairs here?

EDIT

Good answers all around with proper explanations, tried to pick the most fitting as the accepted answer. Thanks!

+3  A: 

i recommend to use array_slice

$count = count($array) ;
for($i=205;$i<$count;$i+=205){
    $result[] = array_slice($array,$i,1);
}
Haim Evgi
At least the syntax is a lot clearer, any idea would this actually improve performance?
Tatu Ulmanen
For loops are slow. Your count() is also ridiculously slow.
cballou
Taking the count out of the loop would indeed speed this up substantially.
middaparka
@cballou, but note that this method only has to do 130000 / 205 loops, whereas foreach would have to loop through all 130 000 items. I have to test it out, I'm worried about array_slice's performance, as that probably does looping from 0 to $i internally.
Tatu Ulmanen
@Tatu, yeah I'd recommend benchmarking to verify.
cballou
Can't you just do $result[] = $array[$i]? Any particular reason to use array_slice?
Kevin
This is really slow on larger arrays. If you change $result=array_slice(...) to $result[] = $array[$i] it's faster than the accepted solution.
Kevin
if u read the question : The items are not numerically indexed,we cant do what u suggest
Haim Evgi
Sorry, I overlooked that.
Kevin
A: 

You can't move the array pointer, it seems, more than once at a time. I would personally use this:

reset($source);
$next = true;
while($next === true){
    $result[] = current($source);
    for(i=0;i<205;i++){
        $next = next($source);
    }
}

If someone can find a function that can move the array pointer more than just one step at a time, you will have a better answer. I think this is good though.

Gausie
You're right, moving the pointer 205 positions at a time would be the solution, otherwise these all are just the same thing with different syntax, I suppose.
Tatu Ulmanen
See my answer. ArrayIterator::seek can move the pointer to position
Gordon
What if there are values that do not evaluate to *true*?
Gumbo
When $next is false, we've reached the end of the array
Gausie
@Gausie: … or if the next element can not be evaluated to *true*: “Returns the array value in the next place that's pointed to by the internal array pointer, or FALSE if there are no more elements.”
Gumbo
The triple equals stops that happening. See http://php.net/manual/en/function.next.php#function.next.returnvalues
Gausie
+4  A: 

Try ArrayIterator::seek()

Also, using one the new Spl datastructures might yield better results than using plain arrays.

Gordon
Will look into that, only reason I didn't accept this as the correct answer was that I would have to change my codebase significantly to support ArrayObjects efficiently. But noted and plussed.
Tatu Ulmanen
A: 

If this really is a bottleneck, you might want to consider rethinking your design in order to make it numerically indexed.

EDIT: Or create and maintain a seperate array with just the 205th items (which gets updated at insert or something like that).

Jonatan Hedborg
A: 

You could use array_keys to work only on the array keys.

$keys = array_keys($array);
for ($i=0, $n=min(count($keys), 130000); $i<$n; $i += 205) {
    $result[] = $array[$keys[$i]];
}
Gumbo
If the array traversal is a bottleneck, won't the retrieval of all keys be?
xtofl
@xtofl: Arrays in PHP are no real arrays like in other languages. They are rather implemented with a hash table. And the keys are stored separately, probably in a linked list.
Gumbo
+2  A: 

A foreach loop provides the fastest iteration over your large array based on comparison testing. I'd stick with something similar to what you have unless somebody wishes to solve the problem with loop unrolling.

This answer should run quicker.

$result = array();
$i = 0;
foreach($source as $value) {
    if ($i++ % 205 == 0) {
        $result[] = $value;
    }
}

I don't have time to test, but you might be able to use a variation of @haim's solution if you first numerically index the array. It's worth trying to see if you can receive any gains over my previous solution:

$result = array();
$source = array_values($source);
$count = count($source);
for($i = 0; $i < $count; $i += 205) {
    $result[] = $source[$i];
}

This would largely depend on how optimized the function array_values is. It could very well perform horribly.

cballou
Great explanation and actually an faster way to accomplish what I tried to do (just slighty, but faster anyways). Thanks!
Tatu Ulmanen
@Tatu, after a quick comparison test on array_push and [] on a test array of 200,000 items I found [] to run about twice as fast. I modified my answer and I'd suggest using the assignment the way you initially had it.
cballou
@Tatu, I quickly benchmarked the two methods above and found the second to be quicker when the starting array is on average larger than 50,000 results. I only had time to test starting with numerically indexed arrays generated by array_fill() and range(), however.
cballou
+2  A: 

I think the solution to this problem doesn't lie in any PHP syntax but in the design of your code.

You could either make the array numerically indexed (might not be plausible for your application), keep track of every 205th item, or only search the array once (cache a list of each 205th item).

In my mind, keeping track of each 205th item would be easier to implement. You'd just keep a count of all the items in a database or something, and every time an item is added, check the modulo of the count. If you have another 205th item, add it to the array. For when items are deleted though, this would be trickier. You might have to re-check the entire array to realign all your 205th items.

Doing this would be simpler if you could start at the deleted item and move forward, but again this would only work for numerically indexed arrays -- and if that were true, you wouldn't have to move forward at all, you would just do a little maths to re-figure it out.

  • Numerical indexes -- better long-term solution but harder to implement
  • Keeping track -- easier to implement, but you'd have to get dirty again when you delete items
  • Caching items -- you should probably do this for the other two solutions as well, but on its own, it would be fast until the array were modified, in which case you would probably have to re-do it.
Carson Myers
A: 
  • Create a two dimentional array [205][N]
  • Load data into array
  • Access 205th element for every N

May sound silly but by definition it is fastest since you access memory locations directly and do not perform any comparisons.

Square Rig Master