views:

307

answers:

3

I'm building a CMS for a scientific journal and that uses a lot of Greek characters. I need to validate a field to include a specific character set and Greek characters. Here's what I have now:

[^a-zA-Z0-9-()/\s]

How do I get this to include Greek characters in addition to alphanumeric, '(', ')', '-', and '_'?

I'm using C#, by the way.

A: 

For Java, from the Pattern javadoc:

\p{InGreek} A character in the Greek block (simple block)

bmargulies
+1  A: 

If you're using a language that uses PCRE for regular expressions and UTF-8, /[\x{0374}-\x{03FF}]+/u should match Greek characters. Greek characters fall between U+0374 and U+03FF (source), and the u modifier tells PCRE to use unicode. As commented below, /\p{Greek}+/u works as well with PCRE.

If you're using Javascript, it uses \uXXXX instead of \x{XXXX}: /[\u0374-\u03FF]+/.

Also see this guide to Unicode Regular Expressions for more information.

Daniel Vandersluis
If you have PCRE, just use `\p{Greek}`.
Tim Pietzcker
+3  A: 

In .NET languages, you can use \p{IsGreekandCoptic} to match Greek characters. So the resulting regex is

[^a-zA-Z0-9-()/\s\p{IsGreekandCoptic}]

\p{IsGreekandCoptic} matches:

These characters will be matched by \p{IsGreekandCoptic}

Tim Pietzcker