tags:

views:

56

answers:

3

I currently use this piece of code to reduce a given text to a valid "tagging" format (only lowercase, a-z and minus allowed) by removing/replacing invalid characters

        $zip_filename = strtolower($original);
        $zip_filename = preg_replace("/[^a-zA-Z\-]/g", '-', $zip_filename); //replace invalid chars
        $zip_filename = preg_replace("/-+/g", '-', $zip_filename); // reduce consecutive minus to only one
        $zip_filename = preg_replace("/^-/g", '', $zip_filename); // removing leading minus
        $zip_filename = preg_replace("/-$/g", '', $zip_filename); // remove trailing minus

Any hints on how to put at least the regex into a single one?

Thanks for any advice!

+3  A: 
$zip_filename = trim(preg_replace("/[^a-z]+/", '-', $zip_filename),'-');

Explanation:

  1. A-Z is useless since it should be lower case
  2. Adding + after right bracket will replace one or more consecutive invalid chars
  3. Using trim with second parameter - character to trim form beginning and end will speed up the code
  4. Removing \- from preg_replace will also take car of hyphens between invalid chars / multiple consecutive hyphens, replacing them to single one.
dev-null-dweller
That doesn't take care of multiple consecutive hyphens, I don't think.
Platinum Azure
@Platinum - You're right. It also doesn't take care of consecutive spaces, or leading or trailing dashes.
Steve Wortham
Nogga
Yup, just edited my answer. Now should work as expected
dev-null-dweller
Ah, I like this better than mine now, especially since I'm not a php guy and didn't know about the optional parameter for the trim function. I do think you need to perform one more trim for spaces. The original code effectively eliminated leading and trailing spaces as well... I think that's the only thing yours doesn't do now.
Steve Wortham
Nevermind, you can disregard that. You're replacing spaces with dashes, and then trimming the dashes. It's a little bit redundant processing but not bad at all, and very easy to look at.
Steve Wortham
A: 

This should simplify it considerably...

$zip_filename = trim(strtolower($original));
$zip_filename = preg_replace("/\s\s+|--+|[^a-zA-Z-]/g", '-', $zip_filename);

The trim will take care of the spaces before and after the string. Also note the \s\s+ and --+. These are more efficient at finding duplicates. They'll only match those characters if there are 2 or more in succession, therefore avoiding unnecessary replacement operations.

But technically it'd still be possible to have leading or trailing dashes. And for that you'd still need this...

$zip_filename = preg_replace("/^-|-$/g", '', $zip_filename);

(This last operation couldn't really be combined with the other since you're using a different replacement string.)

Steve Wortham
A: 

You'll only make the code more difficult to understand/maintain by combining these four operations into one.

You also don't need the complexity and performance hit of regular expression-based operations to achieve what you need.

The reduction of double to single minus signs may be more readily achieved with a looped call to str_replace:

while (substr_count($zip_filename, '--')) {
    $zip_filename = str_replace('--', '-', $zip_filename);
}

Wrapping this up in a well-named class method will abstract away any apparent complexity and aid the readability of the code.

The last two operations can be handled by the trim() function:

$zip_filename = trim($zip_filename, '-');

You can then replace your regular expression-based operations with something less cpu aggressive and arguably easier for others to understand:

//replace invalid chars
$zip_filename = preg_replace("/[^a-zA-Z\-]/g", '-', strtolower($original)); 

// reduce consecutive minus to only one
while (substr_count($zip_filename, '--')) {
    $zip_filename = str_replace('--', '-', $zip_filename);
}

// remove leading and trailing minus
$zip_filename = trim($zip_filename, '-'); 
Jon Cram