tags:

views:

490

answers:

6
J0000000: Transaction A0001401 started on 8/22/2008 9:49:29 AM  J0000010: Project name: E:\foo.pf  J0000011: Job name: MBiek Direct Mail Test  J0000100: Machine name: DEV  J0000100: Project file: E:\mbiek\foo.pf  J0000100: Template file: E:\mbiek\foot.xdt  J0000100: Job name: MBiek  J0000100: Output folder: E:\foo\A0001401  J0000100: Temp folder: E:\foo\Output\A0001401  J0000100: Document 1 - Starting document  J0005000: Document 1 - Text overflowed on page 1 (warning)  J0000101: Document 1 - 1 page(s) composed  J0000102: Document 1 - 1 page(s) rendered at 500 x 647 pixels  J0000100: Document 1 - Completed successfully  J0000020:

I have this gigantic ugly string and I'm tring to extract pieces from it using regex.

In this case, I want to grab everything after "Project Name" up to the part where it says "J0000011:" (the 11 is going to be a different number every time).

Here's the regex I've been playing with

Project name:\s+(.*)\s+J[0-9]{7}:

The problem is that it doesn't stop until it hits the J0000020: at the end.

How do I make the regex stop at the first ocurrence of J[0-9]{7}?

+10  A: 

Make .* non-greedy by adding '?' after it:

Project name:\s+(.*?)\s+J[0-9]{7}:
jj33
A: 

I knew it was something easy. Thanks jj33

Mark Biek
+2  A: 

Using non-greedy quantifiers here is probably the best solution, also because it is more efficient than the greedy alternative: Greedy matches generally go as far as they can (here, until the end of the text!) and then trace back character after character to try and match the part coming afterwards.

Hower, consider using a negative character class instead:

Project name:\s+(\S*)\s+J[0-9]{7}:

\S means “everything except a whitespace and this is exactly what you want.

Konrad Rudolph
A: 

I would also recommend you experiment with regular expressions using "Expresso" - it's a utility a great (and free) utility for regex editing and testing.

One of its upsides is that its UI exposes a lot of regex functionality that people unexprienced with regex might not be familiar with, in a way that it would be easy for them to learn these new concepts.

For example, when building your regex using the UI, and choosing "*", you have the ability to check the checkbox "As few as possible" and see the resulting regex, as well as test its behavior, even if you were unfamiliar with non-greedy expressions before.

Available for download at their site: http://www.ultrapico.com/Expresso.htm

Express download: http://www.ultrapico.com/ExpressoDownload.htm

Hershi
A: 

Well, ".*" is a greedy selector. You make it non-greedy by using ".*?" When using the latter construct, the regex engine will, at every step it matches text into the "." attempt to match whatever make come after the ".*?". This means that if for instance nothing comes after the ".*?", then it matches nothing.

Here's what I used. s contains your original string. This code is .NET specific, but most flavours of regex will have something similar.

string m = Regex.Match(s, @"Project name: (?<name>.*?) J\d+").Groups["name"].Value;

Svend
A: 

@Hershi

I'm actually using RegexBuddy which is definitely helpful in terms of seeing what's going on. Although Espresso does look nice.

@Konrad

Thanks for the tip on \S. That's something I didn't know about although, in my case, there may be spaces in the stuff I want to capture in a group.

Mark Biek