tags:

views:

107

answers:

4

I'm wondering why there have to be so many regular expression dialects. Why does it seem like so many languages, rather then reusing a tried and true dialect, seem bent on writing their own.

Like these.

I mean, I understand that some of these do have very different backends. But shouldn't that be abstracted from the programmer?

I'm more referring to the odd but small differences, like where parentheses have to be escaped in one language, but are literals in another. Or where meta-characters mean somewhat different things.

Is there any particular reason we can't have some sort of universal dialect for regular expressions? I would think it would make things much easier for programmers who have to work in multiple languages.

+3  A: 

For the same reason we have so many languages. Some people will be trying to improve their tools and at the same time others will be resistant to change. C/C++/Java/C# anyone?

Kelly French
+2  A: 

Because regular expressions only have three operations:

  • Concatenation
  • Union |
  • Kleene closure *

Everything else is an extension or syntactic sugar, and so has no source for standardization. Things like capturing groups, backreferences, character classes, cardinality operations, etc are all additions to the original definition of regular expressions.

Some of these extensions make "regular expressions" no longer regular at all. They are able to decide non-regular languages because of these extras, but we still call them regular expressions regardless.

As people add more extensions, they will often try to use other, common variations of regular expressions. That's why nearly every dialect uses X+ to mean "one or many Xs", which itself is just a shortcut for writing XX*.

But when new features get added, there's no basis for standardization, so someone has to make something up. If more than one group of designers come up with similar ideas at around the same time, they'll have different dialects.

Welbog
+1  A: 

The "I made it better" syndrome of programming produces all these things. It's the same with standards. People try to make the next "best" standard to replace all the others and it just becomes something else we all have to learn/design for.

wheaties
+1  A: 

I think a good part of this is the question of who would be responsible for setting and maintaining the standard syntax and ensuring compatibility accross differing environments?

Also, if a regex must itself be parsed inside an interpreter/compiler with it's own unique rules regarding string manipulation than this can cause a need for doing things differently with regard to escapes and literals.

A good strategy is to take time to understand how regex algorithms themselves function at a more abstract level, then implementing any particular syntax becomes much easier. Similar to how each programming language has it's own syntax for constructs like conditional statements and loops, but still accomplish the same abstract task.

hqrsie