tags:

views:

75

answers:

5

I am trying to extract some words from a string. The two cases look like this:

Case 1: "Group X - Ford Mondeo or similar"
Case 2: "Group X - Ford Mondeo"

I would like a single .NET regex that captures "Ford Mondeo" in both cases. The best I have so far is:

^Group [A-Z] - (?<VehicleModel>.+)(?: or similar)$

which returns: Case 1: "Ford Mondeo" Case 2: "" I have tried this:

^Group [A-Z] - (?<VehicleModel>.+)(?: or similar)?$

which returns:

Case 1: ""
Case 2: "Ford Mondeo"

I was trying to say zero or one occurrences of " or similar". I may need an expression that says "if endswith "or similar" do this else this. I've been using Expresso for a good while now and just cant put my finger on what I need. Can you help?

A: 

Remove the dot-plus:

^Group [A-Z] - (?<VehicleModel>)(?: or similar)?$

Or maybe try this:

^Group [A-Z] - (?<VehicleModel>.+?)(?: or similar)?$

I'm not familiar with this .net ?<xyz> syntax, maybe it requires the dot-plus? In that case, .+? makes it non-greedy, so that the .+ won't eat up the " or similar".

Kip
+1  A: 

Try this:

^Group [A-Z] - (?<VehicleModel>.+?)(?: or similar)?$
Yannick M.
+2  A: 

The problem is that .+ in the VehicleGroup captures too much. Append a question mark to make it non-greedy: .+?

soulmerge
+1  A: 
^Group [A-Z] - (?<VehicleModel>.+?)(?:or similar)?$
vucetica
A: 

Depending on whether or not you want to accept any whitespace characters:

/^Group\s[A-Z]\s-\s(?<VehicleModel>.+?)(?:\sor\ssimilar)?$/

Will capture the text with whitespace characters including tabs, and

/^Group [A-Z] - (?<VehicleModel>.+?)(?: or similar)?$/

Will capture only if the text uses spaces. Like others said the key is in the .+? which makes the capturing group non-greedy. Without it the first group will swallow the or similar.

Mark Story
Awesome. Thanks.
IanT8