tags:

views:

264

answers:

4

I need a special regular expression and have no experience in them whatsoever, so I am turning to you guys on this one.

I need to validate a classifieds title field so it doesn't have any special characters in it, almost.

Only letters and numbers should be allowed, and also the three Swedish letters å, ä, ö (upper- or lowercase).

Besides the above, these should also be allowed:

  • The "&" sign.
  • Parentheses "()"
  • Mathematical signs "-", "+", "%", "/", "*"
  • Dollar and Euro signs
  • One accent signed letter: "é". // Only this one is required
  • Double quote and single quote signs.
  • The comma "," and point "." signs
+5  A: 

Try this:

^[\s\da-zA-ZåäöÅÄÖ&()+%/*$€é,.'"-]*$

Breakdown:

^ = matches the start of the string

[...]* = matches any characters (or ranges) inside the brackets one or more times

$ = matches the end of the string

Updated with all the suggestions from the comments. Thanks guys!

Chris Pebble
Uppercase versions of åäöé too?
David Gelhar
what about capital å,ä,ö,é ? perhaps the /i modifier?
Mailslut
+1 Answered the question, but I think his design of sanitizing the string before inserting in html with regex is fragile, an html escape function would be more maintainable and sane.As far as upper case versions of the letters, I think Chris is just giving an example of how to include them, Camran can add all the letters he wants.
marr75
Just add `ÅÄÖ` in there and you're done :) Oh and if you want to be thorough, explain the regex a bit, mainly what do ^, $, * and brackets mean.
Esko
@marr75, its not neccesarily just to sanitize before inserting into HTML, he may want to just disallow titles with other characters in as they could indicate that the title is either garbage or just plain un-neat. You'd still want to sanitize the results of doing this regex afterwards though.
Mailslut
Careful, the `-` is wrong. Needs to be at the end of the character class. And most of the backslashes are unnecessary.
Tim Pietzcker
Could someone edit it so that the minus sign works also? Thanks guys!
Camran
A: 

Hey buddy, its really very very simple. Just do a regex replace of anything you don't want...

In this example - I simple say "this stuff is not allowed"

More specifically it says "if its not in this regex match, replace that character with an empty string.

PHP:

$result = preg_replace('#([^a-zA-Z0-9£()+=%/*$,.])#imx', '', $subject);

if the section where you have a-zA-Z0-9£()+=%/*$ simply add the character your want to pass your regex and be allowed in the post.


Edit:

More expansive

This vbersion of the regex contains all upper case and lower case accented characters. Their in ASCII format as I don't know the key to write them!

$result = preg_replace('#([^a-zA-Z0-9£()+=%/*$,.\x99\xBC\xBD\xBE\xC0\xC1\xC2\xC3\xC4\xC5\xC7\xC8\xC9\xCA\xCB\xCC\xCD\xCE\xCF\xD1\xD2\xD3\xD4\xD5\xD6\xE0\xE1\xE2\xE3\xE4\xE5\xE7\xE8\xE9\xEA\xEB\xEC\xED\xEE\xEF\xF1\xF2\xF3\xF4\xF5\xF6\xF7\xF9\xFA\xFB\xFC\xFD])#imx', '', $subject);

Glycerine
A: 
^[\sa-zA-Z0-9åäö&()+%/*$€é,.'"-]*$

will match all the required characters.

In PHP:

if (preg_match('#^[\sa-zA-Z0-9åäö&()+%/*$€é,.\'"-]*$#i', $subject)) {
 # Successful match
} else {
 # Match attempt failed
}
Tim Pietzcker
+1  A: 

PHP has a variety of functions that can help with text validation. You may find them more appropriate than a straight regex. Consider strip_tags(), htmlspecialchars(), htmlentities()

As well, if you are running >PHP5.2, you can use the excellent Filter functions, which were designed for exactly your situation.

dnagirl