views:

127

answers:

2

I need to extract the SAME type of information (e.g. First name, Last Name, Telephone, ...), from numerous different text sources (each with a different format & different order of the variables of interest).

I want a function that does the extraction based on a regular expression and returns the result as DESCRIPTIVE variables. In other words, instead of returning each match result as submatch[0], submatch[1], submatch[2], ..., have it do EITHER of the following:

1.) return std::map so that the submatches can be accessed via: submatch["first_name"], submatch["last_name"], submatch["telephone"]

2.) return a variables with the submatches so that the submatches can be accessed via: submatch_first_name, submatch_last_name, submatch_telephone

I can write a wrapper class around boost::regex to do #1, but I was hoping there would be a built-in or a more elegant way to do this in C++/Boost/STL/C.

A: 

You can always use enumerations or integral constants to get named indices, e.g.:

enum NamedIndices {
    FirstName = 0,
    LastName  = 1,
    // ...
};

// ...
std::string first = submatch[FirstName];
std::string last  = submatch[LastName ];
Georg Fritzsche
Thank you for fast response.I considered enum's but the problem is the order changes. I.e. for one regex, submatch[0] is the "First Name", for another regex, submatch[0] is the telephone.
Michael
@Michael: Can't you provide a mapping *(InputSource,ResultEnumerator) -> SubmatchIndex* then?
Georg Fritzsche
A: 

Can you use "named capture groups"? It seems like returning a map is exactly what you want.

For example, in RE2

Check wikipedia see if your favorite regex library supports named captures.

Stephen
Thank you -- "named capture groups" is exactly what I want. I was using Boost::regex but I think a switch to Boost::xpressive will do the trick.
Michael