views:

1567

answers:

10

Take a string such as:

In C#: How do I add "Quotes" around string in a comma delimited list of strings?

and convert it to:

in-c-how-do-i-add-quotes-around-string-in-a-comma-delimited-list-of-strings

Requirements:

  • Separate each word by a dash and remove all punctuation (taking into account not all words are separated by spaces.)
  • Function takes in a max length, and gets all tokens below that max length. Example: ToSeoFriendly("hello world hello world", 14) returns "hello-world"
  • All words are converted to lower case.

On a separate note, should there be a minimum length?

+2  A: 

Here's a solution for php:

function make_uri($input, $max_length) {
  if (function_exists('iconv')) {  
    $input = @iconv('UTF-8', 'ASCII//TRANSLIT', $input);  
  }

  $lower = strtolower($input);


  $without_special = preg_replace_all('/[^a-z0-9 ]/', '', $input);
  $tokens = preg_split('/ +/', $without_special);

  $result = '';

  for ($tokens as $token) {
    if (strlen($result.'-'.$token) > $max_length+1) {
      break;
    }

    $result .= '-'.$token;       
  }

  return substr($result, 1);
}

usage:

echo make_uri('In C#: How do I add "Quotes" around string in a ...', 500);

Unless you need the uris to be typable, they don't need to be small. But you should specify a maximum so that the urls work well with proxies etc.

Allain Lalonde
+2  A: 

Check this link: Clean urls through readable slugs in PHP

nicruo
His code is nice, utf-8 conversion is something I'd overlooked. It only solves a portion of what he's asking though.
Allain Lalonde
+5  A: 

I would follow these steps:

  1. convert string to lower case
  2. replace unwanted characters by hyphens
  3. replace multiple hyphens by one hyphen (not necessary as the preg_replace() function call already prevents multiple hyphens)
  4. remove hypens at the begin and end if necessary
  5. trim if needed from the last hyphen before position x to the end

So, all together in a function (PHP):

function generateUrlSlug($string, $maxlen=0)
{
    $string = trim(preg_replace('/[^a-z0-9]+/', '-', strtolower($string)), '-');
    if ($maxlen && strlen($string) > $maxlen) {
        $string = substr($string, 0, $maxlen);
        $pos = strrpos($string, '-');
        if ($pos > 0) {
            $string = substr($string, 0, $pos);
        }
    }
    return $string;
}
Gumbo
I like this solution. I was trying to do this by matching all non alphanumerics and splitting them joining on - I kept trying to match only if they weren't the start or end of the string, but never got it working. In the end i settled on matching the words, and appending.
Shawn Simon
What if the first word is longer than the maximum length ?
Wookai
i returned a substring in that situation
Shawn Simon
+4  A: 

C#

public string toFriendly(string subject)
{
    subject = subject.Trim().ToLower();
    subject = Regex.Replace(subject, @"\s+", "-");
    subject = Regex.Replace(subject, @"[^A-Za-z0-9_-]", "");
    return subject;
}
annakata
i think this has a few issues because what about this situation:(string)someObject fails.becomes: stringsomeobject-fails
Shawn Simon
that did occur to me, but frankly I'm not sure how I'd want to handle it. In the past I've gone with nuking everything between and including the parens, but I suspect it's implementation specific. Whatever you want though, it's trivial to add to the above template.
annakata
+5  A: 

Here is my solution in C#

private string ToSeoFriendly(string title, int maxLength) {
    var match = Regex.Match(title.ToLower(), "[\\w]+");
    StringBuilder result = new StringBuilder("");
    bool maxLengthHit = false;
    while (match.Success && !maxLengthHit) {
        if (result.Length + match.Value.Length <= maxLength) {
            result.Append(match.Value + "-");
        } else {
            maxLengthHit = true;
            // Handle a situation where there is only one word and it is greater than the max length.
            if (result.Length == 0) result.Append(match.Value.Substring(0, maxLength));
        }
        match = match.NextMatch();
    }
    // Remove trailing '-'
    if (result[result.Length - 1] == '-') result.Remove(result.Length - 1, 1);
    return result.ToString();
}
Shawn Simon
A: 

A slightly cleaner way of doing this in PHP at least is:

function CleanForUrl($urlPart, $maxLength = null) {
    $url = strtolower(preg_replace(array('/[^a-z0-9\- ]/i', '/[ \-]+/'), array('', '-'), trim($urlPart)));
    if ($maxLength) $url = substr($url, 0, $maxLength);
    return $url;
}

Might as well do the trim() at the start so there is less to process later and the full replacement is done with in the preg_replace().

Thxs to cg for coming up with most of this: http://stackoverflow.com/questions/539920/what-is-the-best-way-to-clean-a-string-for-placement-in-a-url-like-the-question/540491#540491

Darryl Hein
+2  A: 

A better version:

function Slugify($string)
{
    return strtolower(trim(preg_replace(array('~[^0-9a-z]~i', '~-+~'), '-', $string), '-'));
}
Alix Axel
A: 

In a dynamic URL, these IDs are passed via the query string to a script that ... as the delimiting character because most search engines treat the dash as a ... NET: A Developer's Guide to SEO also covers these three additional methods search engine optimization

+1  A: 

Solution in Perl:

my $input = 'In C#: How do I add "Quotes" around string in a comma delimited list of strings?';

my $length = 20;
$input =~ s/[^a-z0-9]+/-/gi;
$input =~ s/^(.{1,$length}).*/\L$1/;

print "$input\n";

done.

depesz
+1  A: 

Solution in shell:

echo 'In C#: How do I add "Quotes" around string in a comma delimited list of strings?' | \
    tr A-Z a-z | \
    sed 's/[^a-z0-9]\+/-/g;s/^\(.\{1,20\}\).*/\1/'
depesz