views:

105

answers:

6

Hey,

I'm looking for help on writing a script to check a list of phrases/words and compare them to one another and see which one is the properly typed phrase/word.

$arr1 = array('fbook', 'yahoo msngr', 'text me later', 'how r u');  
$arr2 = array('facebook', 'yahoo messenger', 'txt me l8r', 'how are you');

So, in comparing each index in each array, it should go through each array and compare both values. In the end, it should produce:

facebook
yahoo messenger
text me later
how are you

Any help, I appreciate it!

A: 

Can you state the question clearly?

x1a0
This should be a comment on the question, not an answer.
Dominic Rodger
@Dominic; +1, but I can't remember but maybe a rep of 26 isn't enough to comment? But, as I say, I can't remember for sure.
David Thomas
@ricebowl - fair points - it's 50 (http://meta.stackoverflow.com/questions/7237/how-does-reputation-work-on-stackoverflow/7238#7238)
Dominic Rodger
A: 

You need to define some rules while processing these words. By your example, you need a regex and you want the keyword that has a longer length, but there might be cases longer length might not work.

+1  A: 

If your input is fairly simple and you have pspell installed, and the arrays are the same size:

For each index in the two arrays you could explode the string on spaces, pspell_check each word, and the phrase with the highest percentage of words for which pspell_check returned true would be the phrase to keep.

Sample code to get you started:

function percentage_of_good_words($phrase) {
  $words = explode(" ", $phrase);
  $num_good = 0;
  $num_total = count($words);

  if ($num_total == 0) return 0;

  for ($words as $word) {
    if (pspell_check($word)) {
      $num_good++;
    }
  }

  return ($num_good / $num_total) * 100;
}

$length = count($arr1);
$kept = array();
for ($i = 0; i < $length; $i++) {
   $percent_from_arr1 = percentage_of_good_words($arr1[$i]);
   $percent_from_arr2 = percentage_of_good_words($arr2[$i]);
   $kept[$i] = $percent_from_arr1 > $percent_from_arr2 ? $arr1[$i] : $arr2[$i];
}
Dominic Rodger
+1  A: 

If you had an array you know is correct it would be very easy to do something like:

foreach ($correct_array as $word => $num){
    if ($word == $tested_array[$num])
        {echo "this is correct: " . $word . "<br />";}
    else{
        echo "this is incorrectly spelled: " . $tested_array[$num] . "<br />";
    }

}
Alex Mcp
I don't think he has an array he knows is correct, or at least, that's not the way the question reads.
Dominic Rodger
A: 

if all you need to do is make sure it's properly spelled, you can use in_array, like this:

foreach ($arr2 as $val){
   if(in_array($val,$arr1){
     //spelled properly
   }
   else{
     //spelled incorrectly
   }

}

if you want to actually autocorrect them, it would probably take a pretty complicated algorithim, and storing every possible misspelling in a database somewhere.

GSto
I don't think either `$arr1` or `$arr2` are the "reference" spelling.
Dominic Rodger
+1  A: 

There's no way to "guess" which is the correct way, you must have a knowledge base (i.e.: a dictionary).

This dictionary can be implemented using pspell (aspell) as @Dominic mentioned, or you can have your own array as a dictionary.

If you have an array as dictionary, you can use the Levenshtein algorithm, that is available as a function in php to calculate the distance between two words (i.e.: your word and the reference one). So you can iterate over the reference array to find the word(s) that have the smallest difference from the one you're looking for, and those might be the best options to suggest as a correction. If the distance is 0, so the word that is being checked is already correct.

Felipe Ribeiro