tags:

views:

69

answers:

2

We have a string that we need to parse using regex, the string could be either:

  1. There was a problem at XXXX
  2. There was a problem at XXXX, previous failures were YYY

The XXX could be any character (e.g. ".")

How can we make regex that will match:

  1. XXXX
  2. ", previous failures were YYY" (remember could be optional)

Every regex that I tried captures on the first match everything (because greedy or too little because not greedy)

I know this is advance but maybe someone already did it.

+6  A: 
^There was a problem at (.*?)(?:, previous failures were (.*))?$

(.*?) means match everything, but match as little as possible to make this match match. The ^ and $ anchors force the regex to span the entire line so that it will always match something.

EDIT: If you really want the surrounding error text, and not just "XXX" and "YYY", then use the following regex instead:

^There was a problem at (.*?)(, previous failures were .*)?$

EDIT 2: Depending on the format of XXX, you may be able to get away with the following, but only if there are no comma's in "XXX". Unfortuanately, aside from this though, you need atleast the $ anchor to make sure the non-greedy match will match something. As you noted in your question, using a greedy match isn't an option at all (while using . atleast).

There was a problem at ([^,]*)(, previous failures were .*)?
Matthew Scharley
Why make the second parentheses non-capturing? He did want to match the surrounding text.
Tim Pietzcker
Because the third parens **are** capturing, but I doubt he cares about the static error text.
Matthew Scharley
Ok, I just reread the question and reread what your wrote and actually understood. Edited in, but I still think it's a miscommunication more than what he actually wants/needs.
Matthew Scharley
There was no misscommnication :) '^There was a problem at (.*?)(, previous failures were .*)?$' actually did the job.In general I am trying NOT to be strict and use ^ / $, is there an option?
+2  A: 

A Perl, Java, Python, .NET, JavaScript etc. compatible regex could be

^There was a problem at (.*?)(, previous failures were .*)?$

if I understand your question correctly. If you need a code sample, please provide more details.

Tim Pietzcker