tags:

views:

90

answers:

5

I have this web page where users can add smilies to their comments. And I want to limit the number of smilies per comment. The "system" works but I have some problems with the regex part. I have my smilies defined in a config file like so:

$config['Smilies'] = Array (
    // irrelevant stuff
    'smilies' => Array (
     ':)' => 'smile.gif',
     ':(' => 'sad.gif',
     // some more smilies
     's:10' => 'worship.gif',
     's:11' => 'zip.gif',
     's:12' => 'heart.gif',
     // some more smilies
     's:1' => 'dry.gif',
     's:2' => 'lol.gif',
     's:3' => 'lollol.gif',
     // some more smilies
    )
);

And then when I validate the comment (to see how many smilies are there), I loop trough this array and match the smile to the content of the comment. The regex is used like this:

foreach ( $this->config['smilies'] as $smilie => $smilieImage )
{
    $matches = Array ();
    Preg_Match_All ( '/' . Preg_Quote ( $smilie ) . '/i', $Content, $matches );

    $numOfFoundSmilies += Count ( $matches[0] );
}

The problem is that the if I enter "s:10" into the comment, the above code will find two matches: "s:10" and "s:1". My knowledge of regular expressions is very poor, and I can't figure this one out.

+2  A: 

Regular expressions are greedy by default (at least PCREs). Usually you could circumvent this:

/a+/ # selects the whiole string from "aaaaaaa"

/a+?/ # selects only "a"

In your case, this doesn't help much, since you can't just throw in a question mark somewhere. The only possibility is to re-order your search array and instantly replace the found places. Search first for s:10 and second for s:1, and use preg_replace() instead of the matching. This way, the second doesn't find the first anymore.

Another possibility: Split your search array in two. If you know, that the one always has the structure 's:' plus digits, you could have your regexp in this second loop like

Preg_Match_All ( '/' . Preg_Quote ( $smilie ) . '(?![0-9])/i', $Content, $matches );

with (?![0-9]) a look ahead expression looking for any non-digit.

And a third one: If you allow (== convert) smileys only at certain places, you could use this:

Preg_Match_All ( '/\b' . Preg_Quote ( $smilie ) . '\b/i', $Content, $matches );

\b is a "word boundary", usually any not-(letter, digit, underscore). Drawback is obviously, that not all smileys (like "abc;-)xyz") will be found.

Boldewyn
I don't think this will work, because he starts a new regular expression search for each smiley.
Fortega
like Fortega said, this won't work for me. It could if I would replace the found smile as soon as I would find it, but I have to validate first only then convert the text smilies to images if validation passes ...
Jan Hančič
But if the replacement is made by the first regular expression already, then the second regex won't find s:1. +1
Rob Fonseca-Ensor
I said IF I replaced :)
Jan Hančič
Just updated the answer.
Boldewyn
I used your second solution (the third one is not acceptable). And I didn't have to split the array in two. Works like a charm! Thanks!
Jan Hančič
+3  A: 

Your code counts, for each smile code, how many times that code appears in the post, so 's:10' counts both as 's:10' and 's:1'.

A solution would be to look for all smile codes all at once, so that every piece of the post only counts towards a single smile code. This can be done by combining all codes into a single regex.

$codes = array_keys($smilie);
$escCodes = array_map('preg_quote', $codes);
$regex = '/'.implode('|',$escCodes).'/i';

preg_match_all($regex, $Content, $matches);

$found = count($matches);
Victor Nicollet
This also works, but I went for Boldewyn's solution, as it requires less code change. Thanks anway!
Jan Hančič
Yes, the famous `or` expression. +1, I forgot about this simple one.
Boldewyn
A: 

You could change your regexen to use word boundaries or \s (whitespace) to match, so s:1 becomes \bs:1\b or \ss:1\s. Beware that with the second method s:1. will not be matched, and both versions won't match This is my funny texts:1.

Residuum
A: 

Change "s:1" to "s:1[^0-9]" - that matches any "s:1" not followed by another number.

Dave Child
However, that won't match "s:1" when it comes at the very end of a string. Your regex _requires_ another character after it. A negative lookahead would be better in this case: `s:1(?![0-9])`.
Geert
+1  A: 

I'd imagine this code to be faster than a Regex

$replaced = str_replace(array_keys($config['Smilies']), 
                        array_values($config['Smilies']),
                        $message, $count);

This would not solve the issues with s:1 and s:10 though, so I'd suggest to use a more clear delimiter/boundary notation for this, e.g. :s10: instead of s:10. Then it won't be an issue anymore.

In addition, I'd suggest not to use numeric identifiers for this anyway. User's will likely find it tedious to remember them. Why not use easy to memorize labels, e.g. :heart: or :lol:?

Gordon
+1 for human readable labels
nikc
I have human readable labels on some smilies, but I just cant come up with labels for 30 smilies ...
Jan Hančič