tags:

views:

86

answers:

3

In my code, I use a regexp I googled somewhere, but I don't understand it. :)

preg_match("/^[\p{L} 0-9\-]{4,25}$/", $login))

What does that p{L} mean? I know what it does -- all characters with national letters included.

And my second question, I want to sanitize user input for ingame chat, so I'm starting with the regexp mentioned above, but I want to allow most special characters. What's the shortest way to do it? Has someone already prepared a regexp to do it?

+1  A: 

For \p see Unicode character properties basically it require the character to be in a specific character class (Letter, number, ...).

For your filter it depends on what exactly you want to filter but looking at Unicode character classes is the good way to go i think (adding individually any character that seem useful to you).

VirtualBlackFox
A: 

The regular expression means:

Each string with length between 4 and 25, starting with a letter, a space, a number or dash.

\p{L} means literally: a character that matches the property "L", where "L" stands for "any letter".

To understand how regexp work:

http://en.wikipedia.org/wiki/Regular%5Fexpression

http://www.php.net/manual/en/regexp.reference.unicode.php

Roberto Aloi
That is correct, with the added constraints that the matched string must start at the beginning of the test string and must end at the end of the test string (due to the ^ and $ at the beginning and end of the regex).
Ben Torell
A: 

If you want to include most characters why not just exclude the ones that you are not allowing?

You can do this with the ^ in your character class

[^characters I don't want]

Disclaimer: Black listing might not be the best approach depending on what you're trying to do, and has to be more thorough than white listing.

Tim