How to Regex international alphabet (english a-z, + non english)

views:

301

answers:

How to Regex international alphabet (english a-z, + non english)

Hello,

I want to allow only entered data from the English alphabet and from alphabet from Germany

like öäü OR France like áê or Chinese like ...

How can I configure my Regex so it accepts all alphabetical chars from internal alphabet?

+1 A:

With PCRE it would be \w, a "word" character.It also accepts unicode when configured properly.

WoLpH 2010-03-06 10:48:42

`\w` is not a boundary but the character class of word characters.

Gumbo 2010-03-06 11:40:04

... and `\b` is the word boundary.

KennyTM 2010-03-06 11:53:55

Indeed, I have modified my original answer. My explanation was incorrect.

WoLpH 2010-03-06 14:35:04

It varies. Some languages have a "Unicode" flag which extend \d, \w, etc. Some support equivalence classes in a range, e.g. [[=e=]] matches e, é, ê, etc. The regex documentation for your language or library will explain what options are available.

Ignacio Vazquez-Abrams 2010-03-06 10:51:34

+1 A:

This may be a good place to start

Unicode:
http://www.regular-expressions.info/unicode.html

Regex language flavors:
http://www.regular-expressions.info/refflavors.html

leson 2010-03-06 11:23:48

+2 A:

Since you specifically ask for Unicode, \p{L} is the shortcut for a Unicode letter. Not all regex flavors support this syntax, though. .NET, Perl, Java and the JGSoft regex engine will, Python won't, for example.

So, for example \b\p{L}+\b will match an entire word of Unicode characters.

Tim Pietzcker 2010-03-06 12:11:52

In a lot languages, you can simply enter the unicode symbols into the character class: [a-zäöüß] etc.

poke 2010-03-06 14:36:01

That won't help a lot, when he wants to match **all** letters.

Joachim Sauer 2010-03-06 14:42:10

ansaurus

tags:

views:

answers:

How to Regex international alphabet (english a-z, + non english)

related questions