views:

215

answers:

5

Everyone knows how awesome C language is and how much it sucks in text processing tasks. Given these facts. Regex definitely must be part of ISO C. But it isn't. I don't understand why? Are there people who think its not essential?

+7  A: 

Regex is defined as part of IEEE Std 1003.1:2001 (POSIX)

Here's a handly list of which headers are in which standard:

http://www.schweikhardt.net/identifiers.html

tovare
But thats not standard library. Thats just POSIX. What about non POSIX systems?
claws
Good point, regex is such a usefull funktion, but minor variances between regex implementations is not as usefull. A lot of non POSIX systems has significant compatability with POSIX standards in many libraries, even my Casio calculator has some POSIX functionality in it I think (Via the Hitachi C implementation).
tovare
And those are just POSIX regexen. They're different from PCRE, or even Perl regular expressions. Don't even get started with all the different flavors out there in other languages. :)
Robert P
+1 for the "handy list" link.
claws
+4  A: 

Because it is a library feature that would require standardizing on one of the regex languages. Standard bodies are commitee driven, not an easy task.

This document explains the rationalization of the standard: http://www.open-std.org/jtc1/sc22/wg14/www/docs/C99RationaleV5.10.pdf which might clarify why.
Another reason explained in the doc. is to keep the language simple.

There are quite a few downloads available, just use one.

Romain Hippeau
+13  A: 

Regular Expressions don't belong in the C language proper any more than a sound library, a graphics library, or an encryption library does. Doing so would reduce the general purpose nature of the language and greatly inhibit its use as a small and efficient embedded language.

The philosophy of C was to have a very small and efficient language keyword set with standardized libraries for the next layer of functionality. Since things like regex, graphics, sound, encryption, etc. don't have a single platform or standard they don't fit in with the standard C library.

They fit best as user libraries which they currently are.

Amardeep
Nice answer. If one went even further, it could be asked why file I/O is in the standard library -- after all, it's also just one out of many possible file I/O subsystems, right? My guess would be that this is due to C's heritage as the main systems programming language for UNIX, where the file system plays a very important role for just about anything except memory management (such as data persistence, device drivers, IPC, etc.).
stakx
Also, we could cut down the library some more by removing those cumbersome console input/output functions.
BlueRaja - Danny Pflughoeft
I'm not sure this answer really flies. After all, the C Standard provides the option of a "freestanding" implementation, for which a lot of the more complicated functions are optional. At its heart, a regexp library isn't really any more platform- or domain- specific than `printf()` - I'd tend to call it an accident of history more than anything else.
caf
Seriously though, this does not answer the question. Things like a choice of regex/encryption are not platform-specific and certainly *could* be standardized by the C-committee. Needing to run on small, embedded systems has nothing to do with it - there's no reason to link against code that's not being used...
BlueRaja - Danny Pflughoeft
@BlueRaja - regex and encryption are not platform specific but **are** quite vertical. I'm grateful the C standards committee is not trying to make the language all things to all people. There are plenty of other good languages that satisfy those vertical application needs.
Amardeep
claws
@claws - All but the smallest scale embedded platforms (PIC or 8051 for example) usually support the full ANSI C standard. Of course today's embedded systems often run full Linux kernels so the distinction's becoming a bit blurry.
Amardeep
+1  A: 

The point of C is to be small yet powerful. Since regular expressions are typically a large and complex topic, it belongs in a library. It is too bad though that the C committee doesn't "sponser" some well written, standard C, algorithms/data structure libraries. There is a plethora of them out there. I tend to stick with GNU "sponsored" libs whenever I can since they are available for most platforms even if they aren't necessarily the easiest or most efficient to use. They do strike a nice balance.

+3  A: 

Because regexes are not essential to a programming language. Handy? Yes, very much so, when you need them. Essential? No way.

Web developers will naturally consider regexes to be an essential feature of a language because they have to validate all that HTML form data. Developers whose experience is always with one of a few big-name relational database servers will consider SQL support to be essential. Those working in the scientific domain will require support for "big numbers" or tensors. GUI developers think a built-in GUI toolkit is essential. Some folks deal with XML all day and consider XML support to be essential.... etc. you get the idea. This list of "essentials" can get pretty big, and languages like Java have certainly taken the "kitchen sink" approach to their massive standard libraries. I appreciate that C is not a kitchen sink language in that sense.

Be careful not to assume that your favorite language feature is an essential feature for everyone else.

John
One might ask why complex numbers are in standard C now then...
caf
@caf - I'm thinking low hanging fruit? That is, much easier to implement then regex, and helps a lot of people in embedded and signal processing... whether this is misguided or not, I won't offer an opinion on :P
detly
claws
@caf - Because chips have complex arith instructions. The purpose of C is to abstract the lower level machine, not to provide rich language features or massive core libraries.We're all better off that the C standard did not include regexes, otherwise we'd currently be stuck with a crufty pre-unicode, basic regexp interface from 20 years ago. Instead, we have 3rd party libraries that can be improved at a much faster pace. I'd rather upgrade libpcre every year than wait for a new C standard every ten!
John
@John: That's a good answer (abstracting the lower level machine) - you should update your actual answer with that ;) (My point being that complex arithmetic isn't "essential" either). I'd also note that we *are* stuck with some decidedly crufty non-essential functions, like `scanf()` - so I still say "historical accident" as to exactly *what* cruft got in).
caf