tags:

views:

94

answers:

4

I am trying to parse input string using regular expression. I am getting problem when trying to capture a repeating group. I always seem to be matching last instance of the group. I have tried using Reluctant (non greedy) quantifiers, but I seems to be missing some thing. Can someone help?

Regular expression tried:

(OS)\\s((\\w{3})(([A-Za-z0-9]{2})|(\\w{3})(\\w{3}))\\/{0,1}){1,5}?\\r

(OS)\\s((\\w{3}?)(([A-Za-z0-9]{2}?)|(\\w{3}?)(\\w{3}?))\\/{0,1}?){1,5}?\\r

Input String:

OS BENKL/LHRBA/MANQFL\r\n

I always seem to capture last group which is MANQFL group (MAN QFL), and my aim is to get all three groups (there can be 1-5 groups):

(BEN KL) , (LHR BA) and (MAN QFL). 

Anyhelp will be appreciated.

A: 

You can use strtok. Or best use string container, use the find_first_of algorithm to find the required position of \ or / and then extract the substring. Depends if you are using C or C++

DumbCoder
+3  A: 

When you repeat a capturing group in a regular expression, the capturing group only stores the text matched by its last iteration. If you need to capture multiple iterations, you'll need to use more than one regex. (.NET is the only exception to this. Its CaptureCollection provides the matches of all iterations of a capturing group.

Yousui
My intention was to use boost::regex_search to achive this,so that I can loop, but loop executes only once as it matches last instance always, is there any way to get around this ?std::string::const_iterator start = str.begin(), end = str.end();while(regex_search(start,end,what,expr)){ cout << what[0]; cout << what[1]; ... start += what.position () + what.length ();}
omshanti
A: 

In python something like this will find them:

mystr = 'OS BENKL/LHRBA/MANQFL\r\n'
myre = re.compile('((\w{3})(\w{2,3}))')

myre.findall(mystr) # prints [('BENKL', 'BEN', 'KL'), ('LHRBA', 'LHR', 'BA'), ('MANQFL', 'MAN', 'QFL')]
g.d.d.c
I have to do this in C++, I am using boost::regex_search to achive this, but not working:std::string::const_iterator start = str.begin(), end = str.end(); while(regex_search(start,end,what,expr)) { cout << what[0]; cout << what[1]; ... start += what.position () + what.length (); }
omshanti
Your tags included Python. Sorry, but I don't know c++ or the boost libraries.
g.d.d.c
thanks for the help.
omshanti
+1  A: 

If you want a Perl solution, i might be :

use strict;
use warnings;
use 5.10.1;

my @in = ('OS BENKL/LHRBA/MANQFL\r\n', 'OS BENKL/LHRBA/MANQFL/ABCDE/XYZTTT\r\n');
foreach(@in) {
  say "in = $_";
  my @l = $_ =~ m!((\w{3})(\w{2,3}))+!g;
  for(my $i=1; $i<@l; $i+=3) {
    say "($l[$i] $l[$i+1])";
  }
}
M42