I have some useful regexes in Perl. Is there a simple way to translate them to .NET's dialect of regex? If not, is there a concise reference of differences?
They were designed to be compatible with Perl 5 regexes. As such, Perl 5 regexes should just work in .NET.
You can translate some RegexOptions
as follows:
[Flags]
public enum RegexOptions
{
Compiled = 8,
CultureInvariant = 0x200,
ECMAScript = 0x100,
ExplicitCapture = 4,
IgnoreCase = 1, // i in Perl
IgnorePatternWhitespace = 0x20, // x in Perl
Multiline = 2, // m in Perl
None = 0,
RightToLeft = 0x40,
Singleline = 0x10 // s in Perl
}
Another tip is to use verbatim strings so that you don't need to escape all those escape characters in C#:
string badOnTheEyesRx = "\\d{4}/\\d{2}/\\d{2}";
string easierOnTheEyesRx = @"\d{4}/\d{2}/\d{2}";
It really depends on the complexity of the regular expression - many ones will work the same out of the box.
Take a look at this .NET regex cheat sheet to see if an operator does what you expect it to do.
I don't know of any tool that automatically translates between RegEx dialects.
There is a big comparison table in http://www.regular-expressions.info/refflavors.html.
Most of the basic elements are the same, the differences are:
Minor differences:
- Unicode escape sequences. In .NET it is
\u200A
, in Perl it is\x{200A}
. \v
in .NET is just the vertical tab (U+000B), in Perl it stands for the "vertical whitespace" class. Of course there is\V
in Perl because of this.- The conditional expression for named reference in .NET is
(?(name)yes|no)
, but(?(<name>)yes|no)
in Perl.
Some elements are Perl-only:
- Possessive quantifiers (
x?+
,x*+
,x++
etc). Use non-backtracking subexpression ((?>…)
) instead. - Named unicode escape sequence
\N{LATIN SMALL LETTER X}
,\N{U+200A}
. - Case folding and escaping
\l
(lower case next char),\u
(upper case next char).\L
(lower case),\U
(upper case),\Q
(quote meta characters) until\E
.
- Shorthand notation for Unicode property
\pL
and\PL
. You have to include the braces in .NET e.g.\p{L}
. - Odd things like
\X
,\C
. - Special character classes like
\v
,\V
,\h
,\H
,\N
,\R
- Backreference to a specific or previous group
\g1
,\g{-1}
. You can only use absolute group index in .NET. - Named backreference
\g{name}
. Use\k<name>
instead. - POSIX character class
[[:alpha:]]
. - Branch-reset pattern
(?|…)
\K
. Use look-behind ((?<=…)
) instead.- Code evaluation assertion
(?{…})
, post-poned subexpression(??{…})
. - Subexpression reference (recursive pattern)
(?0)
,(?R)
,(?1)
,(?-1)
,(?+1)
,(?&name)
. - Some conditional expression's predicate are Perl-specific:
- code
(?{…})
- recursive
(R)
,(R1)
,(R&name)
- define
(DEFINE)
.
- code
- Special Backtracking Control Verbs
(*VERB:ARG)
- Python syntax
(?P<name>…)
. Use(?<name>…)
instead.(?P=name)
. Use\k<name>
instead.(?P>name)
. No equivalent in .NET.
Some elements are .NET only:
- Variable length look-behind. In Perl, for positive look-behind, use
\K
instead. - Arbitrary regular expression in conditional expression
(?(pattern)yes|no)
. - Character class subtraction (undocumented?)
[a-z-[d-w]]
- Balancing Group
(?<-name>…)
. This could be simulated with code evaluation assertion(?{…})
followed by a(?&name)
.
References: