views:

301

answers:

1

i validate urls with utf-8 characters with a rewrite rule

RewriteRule ^([a-z]{2})/([a-z0-9-]{1,256})/([[:print:]]{1,256})$ index.php?language=$1&categories=$2&get_query=$3 [L]

$get_query is the point, this accepts: test!?!'"<>*+ but fails for accented chars as àèéìòù, or other utf-8

for example in wikipedia this works great: http://en.wikipedia.org/wiki/%E6%B1%89%E8%AF%AD_%E6%BC%A2%E8%AA%9E

any help? :-)

A: 

:print: is [\x20-\x7E] so doesn't include non-ASCII characters. If you want to allow anything, why not just .*, or maybe [^/]*?

bobince
yes thank, i'm working on the [^/] :-)thanks!
TrustWeb
.* won't work for me and also not [^/]* (any number of non slashes) right? I really only want to match alpha hyphen and underscore but alpha needs to include the accents and umlauts etc. Anything just a little more discriminatory?
@tixrus: there's no way to discriminate between the different non-ASCII characters if you have a Unicode-ignorant regex implementation, but you can certainly allow all non-ASCII characters, eg. by using an excluding character group such as `[^\x00-,.-@[-^\`{-\x7F]`.
bobince