views:

87

answers:

5

I have a large array in PHP.

It contains strings that are split into a kind of categories using underscores:

category1_property
category1_category2_category3
category2_category3_category4_category5

I have a function named

array get_values($prefix)

that returns all values of the array that start with a given prefix, e.g.

get_values("category2_category3_");

This function foreach()es through the whole array every time, collecting all strings that start with the prefix, i.e. a simple

foreach ($my_array as $line)
 if (substr($line, 0, strlen($prefix)) == $prefix)) 
  array_push ($result, $line);

I feel bad doing that performance-wise, especially seeing that this operation is performed dozens of times per request.

Does anybody know a way to speed this up without having to resort to a whole different way of storing the data?

Using a database might be fast and clever but I would like to avoid that. The data comes from a file and I can't port it to a database.

Pre-sorting or splitting the construct into a multi-dimensional array or an object is not an option, because I sometimes need to query for parts of a category name (e.g. "category1_ca*")

Thanks in advance for any input.

+1  A: 

For time-efficient access, I think the simplest solution is sorting the array, and using a modified variant of the binary search algorithm to find the lower and higher array bounds that match your query. This works because strings with similar prefixes are always sorted sequentially.

Once you have this range, fetching the matching elements is a simple for loop.

Obviously this is no trivial task, so don't waste any time on this unless this really is a performance problem. Premature optimization, you know the drill...

intgr
+1  A: 

It is unclear to me what the get_values function should match - anyways, this may be the performance-friendly solution you are looking for?

function get_values($prefix) {
 $included_array_from_file = array ( "category1_property", "category1_category2_category3", "category2_category3_category4_category5");

 foreach($included_array_from_file as $val) {
  if(strpos($val,$prefix)===0) {
   $out[] = $val;
  }
 }
 return $out;
}

print_r( get_values("category2_category3_") );

Output:
Array ( [0] => category2_category3_category4_category5 )

UPDATE:

You need to count how many times "category2_category3_" occur in the string, right? In that case, I suggest that you create a multi-dimensional array for the full string, and count each occurrence as shown in this example: (Please notice that the example only illustrates how it can be done - the example currently fails as I am not sure how to build the multi-dimensional array on the fly you may need to call another "create array"-function when adding items to the array.)

Fails ("Cannot use a scalar value as an array") - not sure how to .

$data = array("category1_property", "category1_category2_category3", "category2_category3_category4_category5");
$counter = array();
foreach($data as $val) {
 foreach(explode(":",$val) as $val2) {
  // Now, create a multi-dimensional array with the category items as keys and increment the value by one for each item in the string, as in this example:
  // "category2_category3_category4_category5" ... turns into:
  // $counter[category2] += 1;
  // $counter[category2][category3] += 1;
  // $counter[category2][category3][category4] += 1;
  // $counter[category2][category3][category4][category5] += 1;
 }
}

Intended usage:

echo $counter[category2][category3];
Kristoffer Bohmann
That's what I'm doing at the moment. I am concerned though that calling get_values() a hundred times (with a hundred loops) is performance heavy. I probably won't get around doing some sort of pre-sorting.
Pekka
Pekka, How is the new suggested solution working for you?
Kristoffer Bohmann
+1  A: 

I think you're looking for preg_grep

kemp
+1  A: 

You've really limited the options! Even so, I think pre-splitting the data may be the way to go. Consider:

prefixes 'cat1_cat2_cat3_dog'='fido', 'cat1_cat2_cat3_fish'='goldie', 'cat1_cat2_cat3_frog'='kermit becomes

$arr[cat1][cat2][cat3][dog]=fido
$arr[cat1][cat2][cat3][fish]=goldie
$arr[cat1][cat2][cat3][frog]=kermit

If you want everything with the prefix cat1_cat2:

$arr['cat1']['cat2']=array('cat3'=>array('dog'=>'fido','fish'=>'goldie'));

If you want everything with the prefix cat1_cat2_cat3_f* you need only search the last term in $arr['cat1']['cat2']['cat3']:

$matches=preg_grep("/^f/",array_keys($arr['cat1']['cat2']['cat3']));
foreach($matches as $k){
   $results[]=$arr['cat1']['cat2]['cat3'][$k];
}
dnagirl
I probably won't get around pre-splitting them. Thanks for the input.
Pekka
A: 

Or you could use an anonymous function with array_filter():

function get_values($arr, $str)
{
    $func = create_function('$item', 'return (strpos($item, "' . $str . '") === 0);');
    return array_filter($arr, $func);
}

$prefix = 'category1';
$result = get_values($my_array, $prefix);
GZipp
When called multiple times, this has the same downside as the function I am using at the moment: It invariably has to loop through the whole array.
Pekka
Of course it loops through the whole array. How else will you check each item in the array? (Since you've ruled out any other option, such as "a whole different way of storing" (e.g. caching results, using a database, etc.)) This method is fast, but I do grant you it's not magically instantaneous.
GZipp