tags:

views:

64

answers:

3

Hi,

I am dealing with developing and Application for European Client and they have their native character set.

Now I need to have regex which would allow foreign characters like eéèêë etc and am not sure of how this can be done.

Any Suggestions ?

+3  A: 

If all you want to match is letters (including "international" letters) you can use \p{L}.

You can find some information on regex and Unicode here.

Fredrik Mörk
Should it be done like `/^[a-zA-Z ]+$/\p{L}` coz it is not working this way.
Rachel
@Rachel: You will probably need more than only `\p{L}` since this will match *only* letters (not spaces or other separators or numbers for instance). Exactly how it should looks is impossible to say without knowing the full requirements that you need to fulfill.
Fredrik Mörk
+1  A: 

Depends onn regex library/programming language you use.

zed_0xff
php is the language here.
Rachel
A: 

[e\xE8\xE9\xEA\xEB] will match any one of eéèêë

Daniel Rasmussen
What character encoding are you referring to?
Gumbo
Extended ASCII. Good catch. Should be encoded for ASCII/ANSI (according to http://www.regular-expressions.info/reference.html.) (Though it looks like `\p{L}` is still a better option.)
Daniel Rasmussen
Extended ASCII is not a character set that I'm aware of. This matches up with at least Windows-1252 (ew) and ISO-8859-1.
Thanatos
http://www.asciitable.com/ I guess that's not the official name for it. It's what I run into most, tho.
Daniel Rasmussen
@cyclotis04: There is no character set/encoding named Extended ASCII; it’s just a term for character sets/encodings that have US-ASCII as its base (see http://en.wikipedia.org/wiki/Extended_ASCII). I think the one you are referring to is the code page 437 (see http://en.wikipedia.org/wiki/Code_page_437).
Gumbo