ansaurus

Question

Remove composed words

Answer 1

A:

You can take each word and see, if any word in array starts with it or ends with it. If yes - this word should be removed (unset()).

FractalizeR 2009-09-29 07:09:24

Answer 2

A:

Regex could work. You can define within the regex where the start and end of the string applies.

^ defines the start $ defines the end

so something like

foreach($array as $value)
{
    //$term is the value that you want to remove
    if(preg_match('/^' . $term . '$/', $value))
    {
        //Here you can be confident that $term is $value, and then either remove it from
        //$array, or you can add all not-matched values to a new result array
    }
}

would avoid your issue

But if you are just checking that two values are equal, == will work just as well as (and possibly faster than) preg_match

In the event that the list of $terms and $values are huge this won't come out to be the most efficient of strategies, but it is a simple solution.

If performance is an issue, sorting (note the provided sort function) the lists and then iterating down the lists side by side might be more useful. I'm going to actually test that idea before I post the code here.

2009-09-29 07:21:11

Answer 3

A:

You could put the words into an array, sort the array alphabetically and then loop through it checking if the next words start with the current index, thus being composed words. If they do, you can remove the word in the current index and the latter parts of the next words...

Something like this:

$array = array('palanca', 'plato', 'platopalanca');
// ok, the example array is already sorted alphabetically, but anyway...
sort($array);

// another array for words to be removed
$removearray = array();

// loop through the array, the last index won't have to be checked
for ($i = 0; $i < count($array) - 1; $i++) {

  $current = $array[$i];

  // use another loop in case there are more than one combined words
  // if the words are case sensitive, use strpos() instead to compare
  while ($i < count($array) && stripos($array[$i + 1], $current) === 0) {
    // the next word starts with the current one, so remove current
    $removearray[] = $current;
    // get the other word to remove
    $removearray[] = substr($next, strlen($current));
    $i++;
  }

}

// now just get rid of the words to be removed
// for example by joining the arrays and getting the unique words
$result = array_unique(array_merge($array, $removearray));

kkyy 2009-09-29 07:51:28

Why the downvote?

kkyy 2009-09-29 08:11:51

Answer 4

+2 A:

I think you need to define the problem a little more, so that we can give a solid answer. Here are some pathological lists. Which items should get removed?:

hot, dog, hotdogstand.
hot, dog, stand, hotdogstand
hot, dogs, stand, hotdogstand

SOME CODE

This code should be more efficient than the one you have:

$words = array('hatstand','hat','stand','hot','dog','cat','hotdogstand','catbasket');

$count = count($words);

for ($i=0; $i<=$count; $i++) {
 if (isset($words[$i])) {
  $len_i = strlen($words[$i]);
  for ($j=$i+1; $j<$count; $j++) {
   if (isset($words[$j])) {
    $len_j = strlen($words[$j]);

    if ($len_i<=$len_j) {
     if (substr($words[$j],0,$len_i)==$words[$i]) {
      unset($words[$i]); 
     }
    } else {
     if (substr($words[$i],0,$len_j)==$words[$j]) {
      unset($words[$j]);
     }
    }
   }
  }
 }
}

foreach ($words as $word) {
 echo "$word<br>";
}

You could optimise this by storing word lengths in an array before the loops.

Jonathan Swift 2009-09-29 11:15:08

I already took care of plural forms.I'm updating my question. You make me realize I was taking the wrong approach +1.

The Disintegrator 2009-09-30 03:32:39

ansaurus

tags:

views:

answers:

Remove composed words

related questions