views:

226

answers:

8

I am well aware of what regular expressions are, so please avoid giving me definitions. I am merely looking for an opinion, and maybe even some advice. I am graduating soon with my degree in computer science, and to this point, The only education I have gotten on regular expressions, is through a course on PL design and development. We have never been educated on the actual application and use of it in programs that we are writing, only on using Regex to actually with the programming language.

The question I have is, am I right in assuming the regex is the most powerful tool in matching, and dealing with text? If I am wrong, what else is out there that I should be teaching myself (as opposed to becoming good with regex)? Also, does anybody know any good regex plugins for the Eclipse IDE(Galileo preferably). I'm looking for something that allows me to test the document and maybe highlight what is being done. Thanks

+6  A: 

I would use regular expressions when I'm genuinely expressing a pattern. Some people like to use regular expressions when what they're trying to do can easily be implemented in a very few "primitive" string operations instead (indexOf, substring, contains etc).

I find it's sometimes worth implementing the same operation twice - once with regular expressions and once without. Leave the code for a day, then go back and look at it. Imagine some change you might want to make - which implementation is easier to understand? Which one is easier to change? Sometimes this will be the regular expression, sometimes it will be the primitive string operations.

I would suggest that you document your regular expressions with comments. In particular, any time you've had to look something up when building the regular expression, that's a good candidate for documentation. (There are exceptions here - I can never remember which way round $ and ^ go, but it's obvious when you're looking at a working expression.)

Jon Skeet
+2  A: 

Also, does anybody know any good regex plugins for the Eclipse IDE(Galileo preferably).

I like the Quickrex plugin for Eclipse - it's easy to integrate it into your favorite view.

Brian Clozel
+2  A: 

Regex is built specifically to find strings in text.

There are other ways to do this but they are limited and language specific.

Regex is a very powerfull tool, it is also a technology / syntax that will probably last a long time. As such it is a very good candidate for something to learn at the start of your career.

Shiraz Bhaiji
+3  A: 

It is really depends on what does it means to be powerful.

In term of complexity, RegEx can hardly handle recursive, for example. You need something like Compiler Compiler (Compiler Generation) like JavaCC or YACC to handle that. This is the reason why you cannot easily create XML parser entirely from RegEx. The things is most of the time RegEx is sophisticate enough.

In term of performace, RegEx cannot compete with a direct parsing. For example, if you want to see if a string starts with the word "Prefix"; In RegEx you goes '/^Prefix.*/' but in non-RegEx Java you goes 'Str.startsWith("Prefix")'. The speed of the two is incomparable.

However, RegEx allows code to be a lot more manageable in many cases. The easiest example to see is that if you want to check if a string starts with atleast 10 numbers; In Java, you might write:

for(int i = 0; i < Math.min(10, Str.length); i++) {
    char C = Str.charAt(i);
    if ((C < '0') || (C > '0'))
        reutrn false;
}
return true;

Compares to RegEx:

static final String CheckRegEx   = "^[0-9]{10,}+"; // So you have it expressed in one place
static final String CheckPattern = Pattern.compile(CheckRegEx);

if (CheckPattern.matches(Str)) {
    // Match
}

The code with RegEx is much more manageable.

What I am trying to say is that each technique has trade-ofs and they must be balance.

For most cases, RegEx is a very good tools for jobs it was designed to do.

NawaMan
The regex in the post is ont equal to the char matching code. I think it should be: /^[0-9]{10,}+.*/
p3t0r
Some flavors of regex (eg PCRE) have no problem with recursion.
eyelidlessness
Thanks p3t0r, I change my mind midway and forgot to update that. :-). eyelidlessness, I am aware of PCRE but it not available in many environment. :-)
NawaMan
perl's implementation of regex, imho the most powerful out there, handles recursion (either named, or relative) just fine. it is trivial to match recursive constructs such as nested parenthesis, and even XML if you are feeling brave
Eric Strom
+1  A: 

Regular expressions are the best tool for the job of matching and replacing strings unless they aren't. In a log file, or a text corpus? Awesome. In an XML or HTML document? Terrible. It really depends on the structure and meaning of the text you're trying to process.

Robert Rossney
+2  A: 

Once you learn regular expressions, they're incredibly powerful. It helps if REs are giving first-class citizenship in your chosen language, such as Perl, Ruby, or Python. If they are buried deep in a library, they become cumbersome to use.

I typically write programs in languages with native support for regular expressions. I can easily evaluate the tradeoff between my time and running time. Using substr() and index() might be milliseconds faster at runtime. But if it boils down to the difference between whipping out a regular expression in thirty seconds versus five minutes to program and debug a combination of string-manipulation functions, I'm going to vote for REs almost every time.

Barry Brown
+2  A: 

I guess most of the pros and cons of regex have already been mentioned. I would just add that it helps to understand how regexes are implemented before deciding to apply them to a particular problem. This article might help.

MAK
+2  A: 

regular expressions are certainly a very powerful way of matching text, namely regular languages. of course, today's regex engines handle far more than regular languages (through recursion, code insertion and other tricks).

the next step beyond regular expressions are grammar parsers (yacc, bison et al). if you are interested in parsing, i would encourage you to take a look at the upcoming grammar parser implemented in perl6, which seems to provide a very powerful blend of regex and grammar based parsing, moving far beyond the limitations of conventional regex

Eric Strom