views:

87

answers:

3

Most programming languages have apis for regular expression searching and replacing. In my experience the apis can be quite clunky, probably due to the number of actions available and efficiency considerations.

If you were going to implement an api, which one would you emulate?

Of particular interest is the methods and objects of the api, but also the regexp dialect and adherence to any standards.

+3  A: 

If you emulate an API it is going to be just as clunky as the original (if not more.) I don't see what you are getting at. If you are really worried about losing 100 KB to a regex API you should only implement a minimalistic subset which wouldn't resemble a large one. Check to see if any APIs have configs to disable features you don't need.

beta
Is it not clear? I am asking for examples of apis which are not clunky...
mike g
@mike g - But your question says - "If you were going to implement an api, which one would you emulate?" Not "Which regexp APIs are not clunky?" Sorry to be a pedant.
martin clayton
There are many ways to phrase the same question. This is essentially the same question, although apparently not as clear as I would have liked!
mike g
+2  A: 
Norman Ramsey
A: 

Having actually implemented a full regular expression engine (used in-house in my company's products such as RegexBuddy) and a publicly available "API" based on PCRE (the TPerlRegEx component for Delphi), I recommend not too worry too much about emulating this or that, but instead focus on what your regex library will be used for. Unfortunately, you don't say much about this other than mentioning efficiency. A properly developed library doesn't have to be less efficient just because it has more available features. E.g. PCRE offers a feature-rich regex flavor and excellent performance, but a limited set of library features around it (e.g. no search-and-replace). But adding more library features such as a search-and-replace wouldn't make PCRE slower, because unused calls don't even have to be linked into the final .exe.

There are no regex standards. Only conventions that are frequently flaunted in subtle ways. If "standards" matter, simply use one of the popular regex libraries, even if it isn't perfect.

If you want something off-the-shelf minimalistic, dig up a copy of Henry Spencer's regex.c which implements POSIX regular expressions.

Jan Goyvaerts
The library would be general purpose (for a scripting language). The efficiency issue I was getting at comes from the Java API design. Essentially it takes into account that regexp are expensive to create and run, so has a bunch of intermediate objects (Patterns, Matchers), but these complicate simple actions somewhat.
mike g
Java provides alternative methods for common actions such as String.matches("regex") that don't require the programmer to use the Pattern and Matcher classes explicitly.
Jan Goyvaerts