views:

1149

answers:

2

Quick MYSQL/PHP question. I'm using a "not-so-strict" search query as a fallback if no results are found with a normal search query, to the tune of:

foreach($find_array as $word) { 
  clauses[] = "(firstname SOUNDS LIKE '$word%' OR lastname SOUNDS LIKE '$word%')";
}
if (!empty($clauses)) $filter='('.implode(' AND ', $clauses).')';
$query = "SELECT * FROM table WHERE $filter";

Now, I'm using PHP to highlight the results, like:

foreach ($find_array as $term_to_highlight){
    foreach ($result as $key => $result_string){
     $result[$key]=highlight_stuff($result_string, $term_to_highlight);
    }
}

But this method falls on its ass when I don't know what to highlight. Is there any way to find out what the "sound-alike" match is when running that mysql query?

That is to say, if someone searches for "Joan" I want it to highlight "John" instead.

+2  A: 

The SOUND LIKE condition just compares the SOUNDEX key of both words, and you can use the PHP soundex() function to generate the same key.

So, if you found a matching row and needed to find out which word to highlight, you can fetch both the firstname and lastname, and then use PHP to find which one matches and highlight just that word.

I made this code just to try this out. (Had to test my theory xD)

<?php
// A space seperated string of keywords, presumably from a search box somewhere.
$search_string = 'John Doe';

// Create a data array to contain the keywords and their matches.
// Keywords are grouped by their soundex keys.
$data = array();
foreach(explode(' ', $search_string) as $_word) {
    $data[soundex($_word)]['keywords'][] = $_word;
}

// Execute a query to find all rows matching the soundex keys for the words.
$soundex_list = "'". implode("','", array_keys($data)) ."'";
$sql = "SELECT id, firstname, lastname
        FROM   sounds_like
        WHERE  SOUNDEX(firstname) IN({$soundex_list})
        OR     SOUNDEX(lastname)  IN({$soundex_list})";
$sql_result = $dbLink->query($sql);

// Add the matches to their respective soundex key in the data array.
// This checks which word matched, the first or last name, and tags
// that word as the match so it can be highlighted later.
if($sql_result) {
    while($_row = $sql_result->fetch_assoc()) {
        foreach($data as $_soundex => &$_elem) {
            if(soundex($_row['firstname']) == $_soundex) {
                $_row['matches'] = 'firstname';
                $_elem['matches'][] = $_row;
            }
            else if(soundex($_row['lastname']) == $_soundex) {
                $_row['matches'] = 'lastname';
                $_elem['matches'][] = $_row;
            }
        }
    }
}

// Print the results as a simple text list.
header('content-type: text/plain');
echo "-- Possible results --\n";

foreach($data as $_group) {
    // Print the keywords for this group's soundex key.
    $keyword_list = "'". implode("', '", $_group['keywords']) ."'";
    echo "For keywords: {$keyword_list}\n";

    // Print all the matches for this group, if any.
    if(isset($_group['matches']) && count($_group['matches']) > 0) {
        foreach($_group['matches'] as $_match) {
            // Highlight the matching word by encapsulatin it in dashes.
            if($_match['matches'] == 'firstname') {
                $_match['firstname'] = "-{$_match['firstname']}-";
            }
            else {
                $_match['lastname'] = "-{$_match['lastname']}-";
            }

            echo " #{$_match['id']}: {$_match['firstname']} {$_match['lastname']}\n";
        }
    }
    else {
        echo " No matches.\n";
    }
}
?>

A more generalized function, to pull out the matching soundex word from a strings could look like:

<?php
/**
 * Attempts to find the first word in the $heystack that is a soundex
 * match for the $needle.
 */
function find_soundex_match($heystack, $needle) {
    $words = explode(' ', $heystack);
    $needle_soundex = soundex($needle);
    foreach($words as $_word) {
        if(soundex($_word) == $needle_soundex) {
            return $_word;
        }
    }
    return false;
}
?>

Which, if I am understanding it correctly, could be used in your previously posted code as:

foreach ($find_array as $term_to_highlight){
    foreach ($result as $key => $result_string){
        $match_to_highlight = find_soundex_match($result_string, $term_to_highlight);
        $result[$key]=highlight_stuff($result_string, $match_to_highlight);
    }
}

This wouldn't be as efficient tho, as the more targeted code in the first snippet.

Atli
Thank you so much!
Greg
+2  A: 

Note that SOUNDS LIKE does not work as you think it does. It is not equivalent to LIKE in MySQL, as it does not support the % wildcard.

This means your query will not find "John David" when searching for "John". This might be acceptable if this is just your fallback, but it is not ideal.

So here is a different suggestion (that might need improvement); first use PHPs soundex() function to find the soundex of the keyword you are looking for.

$soundex = soundex($word);
$soundexPrefix = substr($soundex, 0, 2); // first two characters of soundex
$sql = "SELECT lastname, firstname ".
    "FROM table WHERE SOUNDEX(lastname) LIKE '$soundexPrefix%' ".
    "OR SOUNDEX(firstname) LIKE '$soundexPrefix%'";

Now you'll have a list of firstnames and lastnames that has a vague similarity in sounding (this might be a lot entries, and you might want to increase the length of the soundex prefix you use for your search). You can then calculate the Levenshtein distance between the soundex of each word and your search term, and sort by that.

Second, you should look at parameterized queries in MySQL, to avoid SQL injection bugs.

Vegard Larsen
Thanks for pointing this out. "John" does match "John David" in my case, though. I'm breaking up the query into words and checking each one against every field I search. As in, "WHERE (lastname SOUNDS LIKE 'John' OR firstname SOUNDS LIKE 'John') AND (lastname SOUNDS LIKE 'David' OR firstname SOUNDS LIKE 'David') if the search string is "John David". If it's just John, it will only do the first part before the AND -- so it does match. Probably not the most efficient method though, huh?
Greg
Yes, probably. Then my solution becomes a work-around for a problem that doesn't really exist... :)
Vegard Larsen