ansaurus

Question

making a small regular expression a bit more readable

Answer 1

+1 A:

The easiest way to make a long regex more readable is to use the "free-spacing" (or \x) modifier, which would let you write your regex just like you did in the second block of code -- it makes whitespace ignored. This isn't supported by all engines, however (according to the page linked above, .NET, Java, Perl, PCRE, Python, Ruby and XPath support it).

Note also that in free-spacing mode, you can use [ ] instead of \s if you want to only match a space character (unless you're using Java, in which case you have to use \ , which is an escaped space).

There's not really anything you can do for the second line, if you want each element to be optional independently of the other elements, but the fourth can be shortened:

\s([A-Z]+\d{4}):\s

\d is a shorthand class equivalent to [0-9], and {4} specifies that it should appear exactly four times.

The third line can be slightly shortened as well ((?:…) specifies a non-capturing group):

(informational|warning|(?:fatal )? error)?

From an efficiency standpoint, unless you actually need to capture subpatterns each time you use brackets, you can remove all of them, except for on the third line, where the group is needed for the alternation) -- but that one can be made non-capturing. Putting this all together you'd get:

.*?
\s?:?\s?
(?:informational|warning|(?:fatal )?error)?
\s[A-Z]+\d{4}:\s
.*$

Daniel Vandersluis 2010-09-30 18:49:23

I'm on board with the \x... although I didn't use it here (having just found out about \x oh... yesterday :). I'm more interested if there is actually better regex syntax to use than what I used for lines 2 and 4

Nate 2010-09-30 18:50:56

Answer 2

+1 A:

Line 2

I think your regular expression doesn't match with the comment. You probably want this instead:

(\s:\s)?

To make it non-capturing:

(?:\s:\s)?

You should be able to use a literal space instead of \s. This must be a restriction in the tool you are using.

Line 4

[0-9][0-9][0-9][0-9] can be replaced with [0-9]{4}.

In some languages [0-9] is equivalent to \d.

Mark Byers 2010-09-30 18:51:29

I like the non-capturing bit... on a side note, is it possible to mandate one of those two optional groups? i.e., one or the other or both, but not neither?

Nate 2010-09-30 19:05:32

@Nate: I think this is about the best way to do that: `((informational|warning|error|fatal error)(\s:\s)?|\s:\s)`

Mark Byers 2010-09-30 19:08:57

Answer 3

A:

Perhaps you can build the RE from sub-expressions, so that your end RE would look something like this:

 /$preamble$possible_colon$keyword$alphanum$trailer/

zigdon 2010-09-30 18:53:05

ansaurus

tags:

views:

answers:

making a small regular expression a bit more readable

related questions