views:

161

answers:

7

I know there is the perl regex that is sort of a minor de facto standard, but why hasn't anyone come up with a universal set of standard symbols, syntax and behaviors?

+1  A: 

Because making standards is hard. It's nearly impossible to get enough people to agree on anything to make it an official standard, let alone something as complex as regex. Defacto standards are much easier to come by.

Case in point: HTML 5 is not expected to become an official standard until the year 2022. But the draft specification is already available, and major features of the standard will begin appearing in browsers long before the standard is official.

Robert Harvey
Just a note re: HTML5 - while it is expected to be an official recommendation only by 2022, it's expected to become a candidate recommendation by 2012. CSS2 (not 3!) is still only in that candidate recommendation stage, but it's pretty widely implemented at this point. HTML5 will be perfectly usable LONG before 2022.
ceejayoz
I wonder if the flying cars in 2022 will support HTML5.
Chris
CSS 2 is not a candidate recommendation, it's a full recommendation, and has been since 1998. CSS 2.1 is a candidate recommendation, and has been in that status since mid-2007.
Pavel Minaev
@Chris: So long they don't let the drivers write their own markup. I don't want someone's car dropping into my living room because the driver misplaced a `</nav>` tag.
Alan Moore
A: 

Perl was first (or danm near close to first), and while it's perl and we all love it, it's old some people felt it needed more polish (i.e. features). This is where new types came in.

They're starting to nomalize, the regex used in .NET is very similar to the regex used in other languages, i think slowly people are starting to unify, but some are used to thier perl ways and dont want to change.

Aren
Perl was invented in 1987 according to Wikipedia. I can't find a date for grep, but I assure you it was much earlier than that. There may have been implementations in Unix that were even earlier.
Mark Ransom
Perl came pretty late in the game (http://en.wikipedia.org/wiki/Regular_expression#History). Henry Spencer wrote most of the guts in the late 80's before it was incorporated into early Perl. But Spencer's implementation was to replace an already existing proprietary implementation.
D.Shawley
Thanks for the corrections guys. I knew perl was old, but i wasn't sure if it was the oldest. The point still stands, it's evolving, and i think they're slowly starting to converge.
Aren
Despite its age, Perl is still one of the most feature-rich, innovative flavors out there. In fact, most other flavors are still playing catch-up with Perl 5 while Perl 6 leaps ahead, making dramatic improvements to the syntax as well as the functionality.
Alan Moore
@Mark Ransom: Indeed, Perl borrows heavily `sed`, `awk`. Those tools, in addition to others like `vi` and `grep` all get their syntax from `ed`, which is where most of the regex syntax and conventions began.
tylerl
+7  A: 

There is a standard by IEEE associated with the POSIX effort. The real question is "why doesn't everyone follow it"? The answer is probably that it is not quite as complex as PCRE with respect to greedy matching and what not.

D.Shawley
And the followup question is perhaps then: *why isn't the POSIX standard redone/extended to include more syntax?* Because then maybe people might try to follow it.
Peter Boughton
@PeterBoughton: most certainly... now all that we have to do is get anyone to agree on how far we want to go with it. I'm of the opinion that you would be better off with a full parser than most of the extended REs out there. If you need comments in your RE, then it is way too complicated for an RE.
D.Shawley
Well, yes and no. Whilst a full parser might be a better option, it's generally not as concise code (unless there's a compact/generalised DSL for generating parsers?), and - that aside - any Standard should cover what is used (even if not necessarily a sensible approach).
Peter Boughton
A: 

Just a guess: there was never a version popular enough to be considered the canonical standard, and there was no standard implementation. Everyone who came and reimplemented it had their own ideas on how to make it "better".

Mark Ransom
+5  A: 

Actually, there is a regular expression standard (POSIX), but it's crappy. So people extend their RE engine to fit the needs of their application. PCRE (Perl-compatible regular expressions) is a pseudo-standard for regular expressions that are compatible with Perl's RE engine. This is particularly relevant because you can embed Perl's engine into other applications.

tylerl
Crappy in what way?
Norman Ramsey
+1  A: 

I have researched this and could not find anything concrete. My guess is that it's because regex is so often a tool that works ON tools and therefore it's going to necessarily have platform- and tool- specific extensions.

For example, in Visual Studio, you can use regular expressions to find and replace strings in your source code. They've added stuff like :i to match an identifier. On other platforms in other tools, identifiers may not be an applicable concept. In fact, perhaps other platforms and tools reserve the colon character to escape the expression.

Differences like that make this one particularly hard to standardize.

Chris
A valid point, but a standard would not standardize "here's how an identifier is matched", but instead "here's how to extend for custom matching symbols", or whatever, so that extensions could be implemented consistently/predictably across platforms.
Peter Boughton
@Peter Good point, the standard could be generalized to accommodate such things. That would make it harder to read and implement, though (to your point, scaring away more sensible people :)).
Chris
A: 

Because too many people are scared of regular expressions, so they haven't become fully widespread enough for enough sensible people to both think of the idea and be in a position to implement it.

Even if a standards body did form and try to unify the different flavours, too many people would argue stubbornly towards their own approach, whether better or not, because lots of programmers are annoying like that.

Peter Boughton