ansaurus

Question

Regex using several repeatable capture groups

Answer 1

+2 A:

You could be more specific:

(\S*)\s(\S*)\s(\S*)\s(\S*)\s(\S*)\s(\S*)\s

\S matches a non-space character - this makes the regex more efficient by avoiding backtracking, and it allows the regex to fail faster if the input doesn't fit the pattern.

I.e., when applying your regex to the string Jul 6 14:33:00 radius/10.10.100.12 radius: 07/06/2010, it takes the regex engine 2116 steps to find out that it can't match. The regex above fails in 168 steps.

Alan Moore's suggestion to use (\S*+)\s(\S*+)\s(\S*+)\s(\S*+)\s(\S*+)\s(\S*+)\s results in another improvement - now the regex fails within 24 steps (nearly a hundred times faster than the initial regex).

If the match is successful, Alan's and my solution are equivalent, your regex is about ten times slower.

Tim Pietzcker 2010-07-06 19:21:27

You can take it step further and make all quantifiers possessive, i.e., `(\S*+)`. You can't get much more efficient than that.

Alan Moore 2010-07-06 19:28:23

Answer 2

+1 A:

I just thought of something else - why not simply split the string on whitespace?

String[] splitArray = subjectString.split("\\s");

Tim Pietzcker 2010-07-06 19:58:42

The only reason I can't do this is the interface I'm offered is regex and doesn't include the ability to do fun things like that. The first answer is perfect though - the regex should be efficient enough when processing massive amounts of logs per second.

Chris 2010-07-07 16:08:36

ansaurus

tags:

views:

answers:

Regex using several repeatable capture groups

related questions