views:

285

answers:

6

Hi there,

A slug on this context is a string that its safe to use as an identifier, on urls or css. For example, if you have this string:

I'd like to eat at McRunchies!

Its slug would be:

i-d-like-to-eat-at-mcrunchies

I want to know whether there's a standard way of building such strings on Drupal (or php functions available from drupal). More precisely, inside a Drupal theme.

Context: I'm modifying a drupal theme so the html of the nodes it generates include their taxonomy terms as css classes on their containing div. Trouble is, some of those terms' names aren't valid css class names. I need to "slugify" them.

I've read that some people simply do this:

str_replace(" ", "-", $term->name)

This isn't really a enough for me. It doesn't replace uppercase letters with downcase, but more importantly, doesn't replace non-ascii characters (like à or é) by their ascii equivalents. It also doesn't remove "separator strings" from begining and end.

Is there a function in drupal 6 (or the php libs) that provides a way to slugify a string, and can be used on a template.php file of a drupal theme?

A: 

You can use a preg_replace and strtolower :

preg_replace('/[^a-z]/','-', strtolower($term->name)); 
Brice Favre
This is clean and simple. Unfortunately it doesn't do everything I need. But thanks for answering.
egarcia
I just found that basic theme implements what you're looking for this way :$string = strtolower(preg_replace('/[^a-zA-Z0-9_-]+/', '-', $string));
Brice Favre
A: 

I would recommend the transliteration module which path_auto uses. With it you can use the transliteration_get() function. It also does unicode transformation.

googletorp
pathauto does not use the transliteration module. it uses its own function pathauto_cleanstring() which depends on loads of settings from pathauto. http://drupalcontrib.org/api/function/pathauto_cleanstring/6
barraponto
@barraponto You can make pathauto use it to handle unicodes in urls, which is doesn't handle very well otherwise.
googletorp
how do i get pathauto to use transliteration module? i've been looking for that... http://stackoverflow.com/questions/2865742/how-to-use-pathauto-and-transliteration-modules-together
barraponto
Thanks for posting this. However, I'll avoid adding additional modules if possible. I was looking for something provided directly by Drupal or PHP.
egarcia
i bet you are already using pathauto. it has a built-in transliteration file (i18n-ascii.txt) which will provide transliteration in pathauto_cleanstring(). you do NOT need the transliteration module.
barraponto
@barraponto, Transliteration is my recommendation, I haven't said that it's required or the only way.
googletorp
@googletorp, i noticed, i just wanted to make it clear that transliteration is not needed since egarcia discarded your answer because of extra modules. pathauto_cleanstring () is enough, transliteration module is great for file paths (which pathauto does not handle). i highly recommend it.
barraponto
+2  A: 

i am a happy Zen theme user, thus i've met this wonderful function that comes with it: zen_id_safe http://api.lullabot.com/zen_id_safe

it does not depend on any other theme function, so you can just copy it to your module or theme and use it. it is a pretty small and simple function, so i will just paste it here for convenience.

function zen_id_safe($string) {
  // Replace with dashes anything that isn't A-Z, numbers, dashes, or underscores.
  return strtolower(preg_replace('/[^a-zA-Z0-9-]+/', '-', $string));
}

barraponto
This is nearly what I needed. However, it doesn't transliterate and it doesn't remove separators from the begining. Anyway, thanks for taking the time to answer.
egarcia
you can add logic for removing separators (note that it is only a requirement for id, since classes can use everything (see http://barney.w3.org/TR/REC-html40/struct/global.html#adef-class and click on cdata-list). as for proper transliteration, see my comment on googletorp's answer.
barraponto
+1  A: 

This might help, I find I am doing this slugging all the time now rather then use id numbers as unique keys in my tables.

    /** class SlugMaker
    * 
    * methods to create text slugs for urls
    *
    **/

class SlugMaker {

    /** method slugify
    * 
    * cleans up a string such as a page title
    * so it becomes a readable valid url
    *
    * @param STR a string
    * @return STR a url friendly slug
    **/

    function slugifyAlnum( $str ){

    $str = preg_replace('#[^0-9a-z ]#i', '', $str );    // allow letters, numbers + spaces only
    $str = preg_replace('#( ){2,}#', ' ', $str );       // rm adjacent spaces
    $str = trim( $str ) ;

    return strtolower( str_replace( ' ', '-', $str ) ); // slugify


    }


    function slugifyAlnumAppendMonth( $str ){

    $val = $this->slugifyAlnum( $str );

    return $val . '-' . strtolower( date( "M" ) ) . '-' . date( "Y" ) ;

    }

}

Using this and .htaccess rules means you go directly from a url like:

/articles/my-pops-nuts-may-2010

Straight through to the table look up without having to unmap IDs (applying a suitable filter naturally).

Append or prepend some kind of date optionally in order to enforce a degree of uniqueness as you wish.

HTH

Cups
Thanks for posting this. The only thing I don't like about this function is that it leaves the separators at the begining and end of identifiers; if you have something like `#1 - Option 1` it will get transformed into `-1-option-1`, which is not safe for use on css. A minor thing is that it doesn't transliterate.
egarcia
+2  A: 

There's also this from d7 which you can copy to your project:

http://api.drupal.org/api/function/drupal_clean_css_identifier/7

sprugman
It is nice to know that drupal has a function like this. However it doesn't do everything I needed (see my other answers). But +1 for the research effort.
egarcia
only when 7 comes out, but yeah...
sprugman
A: 

Hi everyone,

Thanks for the answers.

I ended up using the slug function explained here: http://www.drupalcoder.com/story/554-how-to-create-url-aliases-in-drupal-without-path-module (at the end of the article, you have to click in order to see the source code).

This does what I need and a couple things more, without needing to include external modules and the like.

Pasting the code below for easy future reference:

/**
 * Calculate a slug with a maximum length for a string.
 *
 * @param $string
 *   The string you want to calculate a slug for.
 * @param $length
 *   The maximum length the slug can have.
 * @return
 *   A string representing the slug
 */
function slug($string, $length = -1, $separator = '-') {
  // transliterate
  $string = transliterate($string);

  // lowercase
  $string = strtolower($string);

  // replace non alphanumeric and non underscore charachters by separator
  $string = preg_replace('/[^a-z0-9]/i', $separator, $string);

  // replace multiple occurences of separator by one instance
  $string = preg_replace('/'. preg_quote($separator) .'['. preg_quote($separator) .']*/', $separator, $string);

  // cut off to maximum length
  if ($length > -1 && strlen($string) > $length) {
    $string = substr($string, 0, $length);
  }

  // remove separator from start and end of string
  $string = preg_replace('/'. preg_quote($separator) .'$/', '', $string);
  $string = preg_replace('/^'. preg_quote($separator) .'/', '', $string);

  return $string;
}

/**
 * Transliterate a given string.
 *
 * @param $string
 *   The string you want to transliterate.
 * @return
 *   A string representing the transliterated version of the input string.
 */
function transliterate($string) {
  static $charmap;
  if (!$charmap) {
    $charmap = array(
      // Decompositions for Latin-1 Supplement
      chr(195) . chr(128) => 'A', chr(195) . chr(129) => 'A',
      chr(195) . chr(130) => 'A', chr(195) . chr(131) => 'A',
      chr(195) . chr(132) => 'A', chr(195) . chr(133) => 'A',
      chr(195) . chr(135) => 'C', chr(195) . chr(136) => 'E',
      chr(195) . chr(137) => 'E', chr(195) . chr(138) => 'E',
      chr(195) . chr(139) => 'E', chr(195) . chr(140) => 'I',
      chr(195) . chr(141) => 'I', chr(195) . chr(142) => 'I',
      chr(195) . chr(143) => 'I', chr(195) . chr(145) => 'N',
      chr(195) . chr(146) => 'O', chr(195) . chr(147) => 'O',
      chr(195) . chr(148) => 'O', chr(195) . chr(149) => 'O',
      chr(195) . chr(150) => 'O', chr(195) . chr(153) => 'U',
      chr(195) . chr(154) => 'U', chr(195) . chr(155) => 'U',
      chr(195) . chr(156) => 'U', chr(195) . chr(157) => 'Y',
      chr(195) . chr(159) => 's', chr(195) . chr(160) => 'a',
      chr(195) . chr(161) => 'a', chr(195) . chr(162) => 'a',
      chr(195) . chr(163) => 'a', chr(195) . chr(164) => 'a',
      chr(195) . chr(165) => 'a', chr(195) . chr(167) => 'c',
      chr(195) . chr(168) => 'e', chr(195) . chr(169) => 'e',
      chr(195) . chr(170) => 'e', chr(195) . chr(171) => 'e',
      chr(195) . chr(172) => 'i', chr(195) . chr(173) => 'i',
      chr(195) . chr(174) => 'i', chr(195) . chr(175) => 'i',
      chr(195) . chr(177) => 'n', chr(195) . chr(178) => 'o',
      chr(195) . chr(179) => 'o', chr(195) . chr(180) => 'o',
      chr(195) . chr(181) => 'o', chr(195) . chr(182) => 'o',
      chr(195) . chr(182) => 'o', chr(195) . chr(185) => 'u',
      chr(195) . chr(186) => 'u', chr(195) . chr(187) => 'u',
      chr(195) . chr(188) => 'u', chr(195) . chr(189) => 'y',
      chr(195) . chr(191) => 'y',
      // Decompositions for Latin Extended-A
      chr(196) . chr(128) => 'A', chr(196) . chr(129) => 'a',
      chr(196) . chr(130) => 'A', chr(196) . chr(131) => 'a',
      chr(196) . chr(132) => 'A', chr(196) . chr(133) => 'a',
      chr(196) . chr(134) => 'C', chr(196) . chr(135) => 'c',
      chr(196) . chr(136) => 'C', chr(196) . chr(137) => 'c',
      chr(196) . chr(138) => 'C', chr(196) . chr(139) => 'c',
      chr(196) . chr(140) => 'C', chr(196) . chr(141) => 'c',
      chr(196) . chr(142) => 'D', chr(196) . chr(143) => 'd',
      chr(196) . chr(144) => 'D', chr(196) . chr(145) => 'd',
      chr(196) . chr(146) => 'E', chr(196) . chr(147) => 'e',
      chr(196) . chr(148) => 'E', chr(196) . chr(149) => 'e',
      chr(196) . chr(150) => 'E', chr(196) . chr(151) => 'e',
      chr(196) . chr(152) => 'E', chr(196) . chr(153) => 'e',
      chr(196) . chr(154) => 'E', chr(196) . chr(155) => 'e',
      chr(196) . chr(156) => 'G', chr(196) . chr(157) => 'g',
      chr(196) . chr(158) => 'G', chr(196) . chr(159) => 'g',
      chr(196) . chr(160) => 'G', chr(196) . chr(161) => 'g',
      chr(196) . chr(162) => 'G', chr(196) . chr(163) => 'g',
      chr(196) . chr(164) => 'H', chr(196) . chr(165) => 'h',
      chr(196) . chr(166) => 'H', chr(196) . chr(167) => 'h',
      chr(196) . chr(168) => 'I', chr(196) . chr(169) => 'i',
      chr(196) . chr(170) => 'I', chr(196) . chr(171) => 'i',
      chr(196) . chr(172) => 'I', chr(196) . chr(173) => 'i',
      chr(196) . chr(174) => 'I', chr(196) . chr(175) => 'i',
      chr(196) . chr(176) => 'I', chr(196) . chr(177) => 'i',
      chr(196) . chr(178) => 'IJ', chr(196) . chr(179) => 'ij',
      chr(196) . chr(180) => 'J', chr(196) . chr(181) => 'j',
      chr(196) . chr(182) => 'K', chr(196) . chr(183) => 'k',
      chr(196) . chr(184) => 'k', chr(196) . chr(185) => 'L',
      chr(196) . chr(186) => 'l', chr(196) . chr(187) => 'L',
      chr(196) . chr(188) => 'l', chr(196) . chr(189) => 'L',
      chr(196) . chr(190) => 'l', chr(196) . chr(191) => 'L',
      chr(197) . chr(128) => 'l', chr(197) . chr(129) => 'L',
      chr(197) . chr(130) => 'l', chr(197) . chr(131) => 'N',
      chr(197) . chr(132) => 'n', chr(197) . chr(133) => 'N',
      chr(197) . chr(134) => 'n', chr(197) . chr(135) => 'N',
      chr(197) . chr(136) => 'n', chr(197) . chr(137) => 'N',
      chr(197) . chr(138) => 'n', chr(197) . chr(139) => 'N',
      chr(197) . chr(140) => 'O', chr(197) . chr(141) => 'o',
      chr(197) . chr(142) => 'O', chr(197) . chr(143) => 'o',
      chr(197) . chr(144) => 'O', chr(197) . chr(145) => 'o',
      chr(197) . chr(146) => 'OE', chr(197) . chr(147) => 'oe',
      chr(197) . chr(148) => 'R', chr(197) . chr(149) => 'r',
      chr(197) . chr(150) => 'R', chr(197) . chr(151) => 'r',
      chr(197) . chr(152) => 'R', chr(197) . chr(153) => 'r',
      chr(197) . chr(154) => 'S', chr(197) . chr(155) => 's',
      chr(197) . chr(156) => 'S', chr(197) . chr(157) => 's',
      chr(197) . chr(158) => 'S', chr(197) . chr(159) => 's',
      chr(197) . chr(160) => 'S', chr(197) . chr(161) => 's',
      chr(197) . chr(162) => 'T', chr(197) . chr(163) => 't',
      chr(197) . chr(164) => 'T', chr(197) . chr(165) => 't',
      chr(197) . chr(166) => 'T', chr(197) . chr(167) => 't',
      chr(197) . chr(168) => 'U', chr(197) . chr(169) => 'u',
      chr(197) . chr(170) => 'U', chr(197) . chr(171) => 'u',
      chr(197) . chr(172) => 'U', chr(197) . chr(173) => 'u',
      chr(197) . chr(174) => 'U', chr(197) . chr(175) => 'u',
      chr(197) . chr(176) => 'U', chr(197) . chr(177) => 'u',
      chr(197) . chr(178) => 'U', chr(197) . chr(179) => 'u',
      chr(197) . chr(180) => 'W', chr(197) . chr(181) => 'w',
      chr(197) . chr(182) => 'Y', chr(197) . chr(183) => 'y',
      chr(197) . chr(184) => 'Y', chr(197) . chr(185) => 'Z',
      chr(197) . chr(186) => 'z', chr(197) . chr(187) => 'Z',
      chr(197) . chr(188) => 'z', chr(197) . chr(189) => 'Z',
      chr(197) . chr(190) => 'z', chr(197) . chr(191) => 's',
      // Euro Sign
      chr(226) . chr(130) . chr(172) => 'E'
    );
  }

  // transliterate
  return strtr($string, $charmap);
}

function is_slug($str) {
  return $str == slug($str);
}
egarcia