tags:

views:

86

answers:

2

When I learned regular expressions I learned they should start and end with a slash character (followed by modifiers).

For example /dog/i

However, in many examples I see them starting and ending with other characters, such as @, #, and |.

For example |dog|

What's the difference?

+4  A: 

Some RE engines will allow you to use a different character so as to avoid having to escape those characters when used in the RE.

For example, with sed, you can use either of:

sed 's/\/path\/to\/directory/xx/g'
sed 's?/path/to/directory?xx?g'

The latter is often more readable. The former is sometimes called "leaning toothpicks". With Perl, you can use either of:

$x =~ /#!\/usr\/bin\/perl/;
$x =~ m!#\!/usr/bin/perl!;

but I still contend the latter is easier on the eyes, especially as the REs get very complex. Well, as easy on the eyes as any Perl code could be :-)

paxdiablo
+5  A: 

This varies enormously from one regex flavor to the next. For example, JavaScript only lets you use the forward-slash (or solidus) as a delimiter for regex literals, but in Perl you can use just about any punctuation character--including, in more recent versions, non-ASCII characters like « and ». When you use characters that come in balanced pairs like braces, parentheses, or the double-arrow quotes above, they have to be properly balanced:

m«\d+»
s{foo}{bar}

Ruby also lets you choose different delimiters if you use the %r prefix, but I don't know if that extends to the balanced delimiters or non-ASCII characters. Many languages don't support regex literals at all; you just write the regexes as string literals, for example:

r'\d+'    // Python
@"\d+"    // C#
"\\d+"    // Java

Note the double backslash in the Java version. That's necessary because the string gets processed twice: once by the Java compiler and once by the compile() method of the Pattern class. Most other languages provide a "raw" or "verbatim" form of string literal that all but eliminates such backslash-itis.

And then there's PHP. Its preg regex functions are built on top of the PCRE library, which closely imitates Perl's regexes, including the wide variety of delimiters. However, PHP itself doesn't support regex literals, so you have to write them as if they were regex literals embedded in string literals, like so:

'/\d+/g'  // match modifiers go after the slash but inside the quotes
"{\\d+}"  // double-quotes may or may not require double backslashes

Finally, note that even those languages which do support regex literals don't usually offer anything like Perl's s/…/…/ construct. The closest equivalent is a function call that takes a regex literal as the first argument and a string literal as the second, like so:

s = s.replace(/foo/i, 'bar')  // JavaScript
s.gsub!(/foo/i, "bar")        // Ruby
Alan Moore