tags:

views:

122

answers:

2

Types of writing systems:

  • Alphabet
  • Abjad
  • Abugida
  • Syllabary
  • Logography

In regular expressions we need to tell which "chars" we want to validate:

We use something like this a-zA-Z0-9 to say that we accept all the alphanumeric.

How can we make regular expressions that validate other writing systems non-alphanumerics? (how can I make a regular expression that will validate chinese, or indian, or greek or russian, or someother?

UPDATE:

Using ASP.NET regex engine.

If you don't mind, could you provide me some examples?

Thanks

+2  A: 

What regex engine are you using? If you are using Java or .NET, there are many different unicode categories you can use, such as \p{InGreek}.

Another solution, which is perhaps more generic, is to use unicode ranges. This page contains a list of several well known unicode ranges. For instance, if you want to match a Tibetan character, you would use [\u0F00-\u0FFF]. If you want to match a Tibetan character and English characters, you could use [A-Za-z\u0F00-\u0FFF], et cetera.

If you want to match several languages, you can use the page that I mentioned to lookup the languages' unicode range, and combine them. For example, the unicode range [\u0370-\u06FF] covers Greek, Cyrillic (used in Russian languages and other Slavic languages), Hebrew and Arabic. If you need more, just add the ranges you need until all languages are covered.


EDIT: Based on your comments, you can just use the following expression:

@"\p{L}{4,10}"

\p{L} or \p{Letter} is used to match a letter from any language so, the above expression matches 4 to 10 letters from any language.

JG
Is it possible to have only one regex to match all the languages? At least to see if they write something between 4 to 10 chars? (this is the basic expression).
emanyalpsid
A: 

+1 to @JG Also you can use predefined charset classes. If ECMAScript option not specified then \w treated as any word character - for unicode is "what doctor ordered". The same \d is decimal digits and so on..

Dewfy