views:

206

answers:

2

Given an array with n values, for example:

$arr[] = 'ABCDEFABC';
$arr[] = 'ABCDEFDEF';
$arr[] = 'ABCDEFGHI';
$arr[] = 'ABCDEFJKL';

how can I find the initial segment that matches all (or most, in the example bellow) values, in this case ABCDEF?

EDIT 2: NOT SOLVED, SEE ANSWER.

Even worse, given the following array:

$arr[] = 'ABCDEFABC';
$arr[] = 'ABCDEFDEF';
$arr[] = 'ABCDEFGHI';
$arr[] = 'ABCDEFJKL';
$arr[] = 'DEFABCABC';
$arr[] = 'DEFABCDEF';
$arr[] = 'DEFABCGHI';
$arr[] = 'DEFABCJKL';

how can I get:

$result[] = 'ABCDEF';
$result[] = 'DEFABC';

This one is tricky... What I'm trying to accomplish is the behavior of strspn() (where the order of the "mask" does matter, thank you Zed) applied to arrays.

EDIT: To clarify things up a bit what I want is to find the all the common letters that exist in the same index in all the values of the array (not sure if this made it easier or not!). In this second problem, since all chars don't match the index in the other values I need to match the maximum number of identical initial segments (in this case 2: ABCDEF and DEFABC).

A: 

I've come up with a solution for my first problem:

EDIT: BUGGY!

$arr = array();

// Bug: ABCDEFX

$arr[] = 'ABCDEFAXC';
$arr[] = 'ABCDEFDXF';
$arr[] = 'ABCDEFGXI';
$arr[] = 'ABCDEFJXL';

/*
$arr[] = 'ABCDEFABC';
$arr[] = 'ABCDEFDEF';
$arr[] = 'ABCDEFGHI';
$arr[] = 'ABCDEFJKL';
*/

// ABCDEF    
$result = implode('', call_user_func_array('array_intersect_assoc', array_map('str_split', $arr)));

One left to go now...

Alix Axel
+2  A: 

If I understand correctly, you're trying to determine the set of longest common prefixes given a set of strings.

Breaking it down, the shared prefix between any two strings may be found as

function longestCommonPrefix($str1, $str2) {
  $minLen = min(strlen($str1), strlen($str2));
  for ($i = 0; $i < $minLen; $i++) {
      if ($str1[$i] != $str2[$i]) {
          return substr($str1, 0, $i);
      }
  }
  return substr($str1, 0, $minLen);
}

One way to get the set of prefixes could then be:

function longestCommonPrefixes($arr) {
  sort($arr);
  $prefixes = array();
  for ($i = 0; $i < count($arr); $i++) {
      for ($j = $i+1; $j < count($arr); $j++) {
          $prefix = longestCommonPrefix($arr[$i], $arr[$j]);
          if ($prefix == "") break;
          $prefixes[$prefix] = true;
      }
  }
  return array_keys($prefixes);
}

Notice that the returned prefixes may be prefixes of each other. That is, it is possible for the result to contain a set of strings such as array('A', 'AA', 'AAA').

Putting it all together:

$arr = array();
$arr[] = 'ABCDEFABC';
$arr[] = 'ABCDEFDEF';
$arr[] = 'ABCDEFGHI';
$arr[] = 'ABCDEFJKL';
$arr[] = 'DEFABCABC';
$arr[] = 'DEFABCDEF';
$arr[] = 'DEFABCGHI';
$arr[] = 'DEFABCJKL';

print_r(longestCommonPrefixes($arr));

yields

Array
(
    [0] => ABCDEF
    [1] => DEFABC
)
Anders Johannsen