views:

521

answers:

2

I have an input field where both regular text and sprintf tags can be entered. Example: some text here. %1$s done %2$d times

How do I validate the sprintf parts so its not possible them wrong like %$1s ? The text is utf8 and as far as i know regex only match latin1 characters.

www.regular-expressions.info doesnt list /u anywhere, which I think is used to tell that string is unicode.

Is the best way to just search the whole input field string for % or $ and if either found then apply the regex to validate the sprintf parts ?

I think the regex would be: /%\d\$(s|d|u|f)/u

+1  A: 

The UTF-8 modifier is not necessary unless you use UTF-8 in your pattern. And beside that the sprintf format is more complex, try the following

/%(?:\d+\$)?[dfsu]/

This would match both the %s and %1$s format.

But if you want to check every occurrence of % and whether a valid sprintf() format is following, regular expressions would not be a good choice. A sequential parser would be better.

Gumbo
A sequential parser ?I can use preg_match_all to find all %-words but Im having problems stopping it at the first space or EOL. Using the above example will I get an array with two entries: [0]="%1$s done ", [1]="%2$d times".$realRegEx=Explode(" ",[0]) works but there must be some way with regex.
Kim
A: 

This is what I ended up with, and its working.

// Always use server validation even if you have JS validation
if (!isset($_POST['input']) || empty($_POST['input'])) {
  // Do stuff
} else {
  $matches = explode(' ',$_POST['input']);
  $validInput = true;

  foreach ($matches as $m) {
    // Check if a slice contains %$[number] as it indicates a sprintf format
    if (preg_match('/[%\d\$]+/',$m) > 0) {
      // Match found. Now check if its a valid sprintf format
      if ($validInput === false || preg_match('/^%(?:\d+\$)?[dfsu]$/u',$m)===0) {   // no match found
        $validInput = false;
        break; // Invalid sprintf format found. Abort
      }
    }
  }

  if ($validInput === false) {
    // Do stuff when input is NOT valid
  }
}

Thank you Gumbo for the regex pattern that matches both with and without order marking.

Edit: I realized that searching for % is wrong, since nothing will be checked if its forgotten/omitted. Above is new code.

"$validInput === false ||" can be omitted in the last if-statement, but I included it for completeness.

Kim
You should change the first regex to “/%[^\s]*/” since the formats can also accur at the end of a string therefor having no following whitespace. And the second should be changed to “/^%(?:\d+\$)?[dfsu]$/” otherwise “%%1$s” would also be accepted as valid.
Gumbo
So the text “10 apples cost $4” would have two invalid sprintf formats, as both “10” and “$4” would match the first but not the second regex. That’s not a good idea, isn’t it?
Gumbo
No, its okay since that text should be "%u %s cost %s".
Kim