views:

291

answers:

6

I am using this function to generate SEO friendly titles, but I think it can be improved, anyone want to try? It does a few things: cleans common accented letters, check against a "forbidden" array, and check optionally against a database of titles in use.

 /**
 * Recursive function that generates a unique "this-is-the-title123" string for use in URL.
 * Checks optionally against $table and $field and the array $forbidden to make sure it's unique.
 * Usage: the resulting string should be saved in the db with the object. 
 */
 function seo_titleinurl_generate($title, $forbidden = FALSE, $table = FALSE, $field = FALSE)
 {
  ## 1. parse $title
  $title = clean($title, "oneline"); // remove tags and such

  $title = ereg_replace(" ", "-", $title); // replace spaces by "-"
  $title = ereg_replace("á", "a", $title); // replace special chars
  $title = ereg_replace("í", "i", $title); // replace special chars
  $title = ereg_replace("ó", "o", $title); // replace special chars
  $title = ereg_replace("ú", "u", $title); // replace special chars
  $title = ereg_replace("ñ", "n", $title); // replace special chars
  $title = ereg_replace("Ñ", "n", $title); // replace special chars

  $title = strtolower(trim($title)); // lowercase
    $title = preg_replace("/([^a-zA-Z0-9_-])/",'',$title); // only keep standard latin letters and numbers, hyphens and dashes

  ## 2. check against db (optional)
  if ($table AND $field)
  {
   $sql = "SELECT * FROM $table WHERE $field = '" . addslashes($title) . "'";
   $res = mysql_debug_query($sql);
   if (mysql_num_rows($res) > 0)
   {
    // already taken. So recursively adjust $title and try again.
    $title = append_increasing_number($title);
    $title = seo_titleinurl_generate($title, $forbidden, $table, $field);
   }
  }

  ## 3. check against $forbidden array
  if ($forbidden)
  {
   while (list ($key, $val) = each($forbidden))
   {
    // $val is the forbidden string
    if ($title == $val)
    {
     $title = append_increasing_number($title);
     $title = seo_titleinurl_generate($title, $forbidden, $table, $field);
    }
   }
  }
  return $title;
 }
 /**
 * Function that appends an increasing number to a string, for example "peter" becomes "peter1" and "peter129" becomes "peter130".
 * (To improve, this function could be made recursive to deal with numbers over 99999.)
 */
 function append_increasing_number($title)
 {
  ##. 1. Find number at end of string.
  $last1 = substr($title, strlen($title)-1, 1);
  $last2 = substr($title, strlen($title)-2, 2);
  $last3 = substr($title, strlen($title)-3, 3);
  $last4 = substr($title, strlen($title)-4, 4);
  $last5 = substr($title, strlen($title)-5, 5); // up to 5 numbers (ie. 99999)

  if (is_numeric($last5))
  {
   $last5++; // +1
   $title = substr($title, 0, strlen($title)-5) . $last5;
  } elseif (is_numeric($last4))
  {
   $last4++; // +1
   $title = substr($title, 0, strlen($title)-4) . $last4;
  } elseif (is_numeric($last3))
  {
   $last3++; // +1
   $title = substr($title, 0, strlen($title)-3) . $last3;
  } elseif (is_numeric($last2))
  {
   $last2++; // +1
   $title = substr($title, 0, strlen($title)-2) . $last2;
  } elseif (is_numeric($last1))
  {
   $last1++; // +1
   $title = substr($title, 0, strlen($title)-1) . $last1;
  } else 
  {
   $title = $title . "1"; // append '1' 
  }

  return $title;
 }
+2  A: 

You could lose the:

$title = ereg_replace(" ", "-", $title);

And replace those lines with the faster str_replace():

$title = str_replace(" ", "-", $title);

From the PHP manual page for str_replace():

If you don't need fancy replacing rules (like regular expressions), you should always use this function instead of ereg_replace() or preg_replace().

EDIT:

I enhanced your append_increasing_number($title) function, it does exactly the same thing, only with no limit on the number of digits at the end (and it's prettier :) :

function append_increasing_number($title)
{
    $counter = strlen($title);
    while(is_numeric(substr($title, $counter - 1, 1))) {
        $counter--;
    }
    $numberPart = (int) substr($title,$counter,strlen($title) - 1);
    $incrementedNumberPart = $numberPart + 1;
    return str_replace($numberPart, $incrementedNumberPart, $title);
}
karim79
+2  A: 

Following karim79's answer, the first part can be made more readable and easier to maintain like this:

Replace

$title = ereg_replace(" ", "-", $title); // replace spaces by "-"
$title = ereg_replace("á", "a", $title); // replace special chars
$title = ereg_replace("í", "i", $title); // replace special chars

with

$replacements = array(
  ' ' => '-',
  'á' => 'a',
  'í' => 'i'
);
$title = str_replace(array_keys($replacements, array_values($replacements), $title);

The last part where append_increasing_number() is used looks bad. You could probably delete the whole function and just do something like

while ($i < 99999){
//check for existance of $title . $i; if doesn't exist - insert!
}
David Caunt
+1 for the use of array arguments with str_replace, but the while() loop idea isn't a winner.
Frank Farmer
Agreed, though more readable than the current code. Perhaps I've misunderstood what it does - it's hard to read!
David Caunt
+4  A: 

There appears to be a race condition because you're doing a SELECT to see if the title has been used before, then returning it if not (presumably the calling code will then INSERT it into the DB). What if another process does the same thing, but it inserts in between your SELECT and your INSERT? Your insert will fail. You should probably add some guaranteed-unique token to the URL (perhaps a "directory" in the path one level higher than the SEO-friendly name, similar to how StackOverflow does it) to avoid the problem of the SEO-friendly URL needing to be unique at all.

I'd also rewrite the append_increasing_number() function to be more readable... have it programmatically determine how many numbers are on the end and work appropriately, instead of a giant if/else to figure it out. The code will be clearer, simpler, and possibly even faster.

rmeador
It's interesting to see everyone's approach to this. Everyone's made it faster and prettier, you made it work. Voted up.
Neil Trodden
A unique token as rmeador suggested is a good idea, this could also be implemented as a unique constraint on the database column to cause the subsequent insert to fail.
karim79
+1  A: 

You can also use arrays with str_replace() so you could do

$replace = array(' ', 'á');
$with = array('-', 'a');

The position in the array must correspond.

That should shave a few lines out, and a few millisceonds.

You'll also want to give consideration to all punctuation, it's amazing how often, ifferent sets of `'" quotes and !? etc get into urls. I'd do a preg_replace on \W (not word)

preg_replace('/\w/', '', $title);

That should help you a bit.

Phil

Phil Carter
+4  A: 

The str_replace suggestions above are excellent. Additionally, you can replace that last function with a single line:

function append_increasing_number($title) {
  return preg_replace('@([0-9]+)$@e', '\1+1', $title);
}

You can do even better and remove the query-in-a-loop idea entirely, and do something like

"SELECT MAX($field) + 1 FROM $table WHERE $field LIKE '" . mysql_escape_string(preg_replace('@[0-9]+$@', '', $title)) . "%'";

Running SELECTs in a loop like that is just ugly.

Frank Farmer
+1 Very smart suggestions
David Caunt
+1 to adjust the under-voting of this answer. Karmic balance restored.
karim79
+2  A: 

It looks like others have hit most of the significant points (especially regarding incrementing the suffix and executing SQL queries recursively / in a loop), but I still see a couple of big improvements that could be made.

Firstly, don't bother trying to come up with your own diacritics-to-ASCII replacements; you'll never catch them all and better tools exist. In particular, I direct your attention to iconv's "TRANSLIT" feature. You can convert from UTF-8 (or whatever encoding is used for your titles) to plain old 7-bit ASCII as follows:

$title = strtolower(strip(clean($title)));
$title = iconv('UTF-8', 'ASCII//TRANSLIT', $title);
$title = str_replace("'", "", $title);
$title = preg_replace(array("/\W+/", "/^\W+|\W+$/"), array("-", ""), $title);

Note that this also fixes a bug in your original code where the space-to-dash replacement was called before trim() and replaces all runs of non-letter/-number/-underscores with single dashes. For example, " Héllo, world's peoples!" becomes "hello-worlds-peoples". This replaces your entire section 1.

Secondly, your $forbidden loop can be rewritten to be more efficient and to eliminate recursion:

if ($forbidden)
{
    while (in_array($title, $forbidden))
    {
        $title = append_increasing_number($title);
    }
}

This replaces section 3.

Ben Blank