tags:

views:

76

answers:

2

hi, i have a phrase like this

Computer, Eddie is gone to the market.

I want to get the word Eddie and ignore all of the other words since other words are constant, and the word Eddie could be anything.

How can I do this in regular expression?

Edit:

Sorry I'm using .NET regex :)

+6  A: 

You can use this pattern:

Computer, (\w+) is gone to the market\.

This uses brackets to match \w+ and captures it in group 1.

Note that the period at the end has been escaped with a \ because . is a regex metacharacter.

Given the input:

LOL! Computer, Eddie is gone to the market. Blah blah
blah. Computer, Alice is gone to the market... perhaps...

Computer, James Bond is gone to the market.

Then there are two matches (as seen on rubular.com). In the first match, group 1 captured Eddie. In the second match, group 1 captured Alice.

Note that \w+ doesn't match James Bond, because \w+ is a sequence of "one or more word character". If you need to match these kinds non-"single word" names, then simply replace it with the regex to match the names.

References


General technique

Given this test string:

i have 35 dogs, 16 cats and 10 elephants

Then (\d+) (cats|dogs) yields 2 match results (see on rubular.com)

  • Result 1: 35 dogs
    • Group 1 captures 35
    • Group 2 captures dogs
  • Result 2: 16 cats
    • Group 1 captures 16
    • Group 2 captures cats

Related questions


C# snippet

Here's a simple example of capturing groups usage:

var text = @"

LOL! Computer, Eddie is gone to the market. Blah blah
blah. Computer, Alice is gone to the market... perhaps...

Computer, James Bond is gone to the market.

";

Regex r = new Regex(@"Computer, (\w+) is gone to the market\.");

foreach (Match m in r.Matches(text)) {
  Console.WriteLine(m.Groups[1]);
}

The above prints (as seen on ideone.com):

Eddie
Alice

API references


On specification

As noted, \w+ does not match "James Bond". It does, however, match "o_o", "giggles2000", etc (as seen on rubular.com). As much as reasonably practical, you should try to make your patterns as specific as possible.

Similarly, (\d+) (cats|dogs) will match 100 cats in $100 catsup (as seen on rubular.com).

These are issues on the patterns themselves, and not directly related to capturing groups.

polygenelubricants
+2 for explaining further about regex, really helps a lot.. :)
rob waminal
+2  A: 
/^Computer, \b(.+)\b is gone to the market\.$/

Eddie would be in the first captured string $1. If you specify the language, we can tell you how to extract it.

Edit: C#:

Match match = Regex.Match(input, @"^Computer, \b(.+)\b is gone to the market\.$");
Console.WriteLine(match.Groups[1].Value);

Get rid of ^ and $ from the regex if the string would be part of another string - they match start and end of a line respectively.

Amarghosh
It should be noted that this can capture `James Bond`, which isn't a "single word". The lesson here is that OP should've been more specific what is to be matched and captured.
polygenelubricants
@polygene OP said word Eddie could be anything; I took the liberty to assume it can be James Bond too :)
Amarghosh
yep, it could also be James Bond, it could be a phrase a word or a letter or a number, anything could be as long as the other words are the same.
rob waminal
@rob: then perhaps though not ideal, `(.+)` or `(.+?)` is acceptable (see http://stackoverflow.com/questions/3075130/difference-between-and-for-regex/3075532#3075532 -- specifically the `A-Z` and `A-ZZ` examples).
polygenelubricants