tags:

views:

590

answers:

2

Hi

I`m writing a web spider and want to use boost regex library instead of crafting some complicated parsing functions.

I took a look at this example:

#include <string> 
#include <map> 
#include <boost/regex.hpp> 

// purpose: 
// takes the contents of a file in the form of a string 
// and searches for all the C++ class definitions, storing 
// their locations in a map of strings/int's 
typedef std::map<std::string, int, std::less<std::string> > map_type; 

boost::regex expression(
   "^(template[[:space:]]*<[^;:{]+>[[:space:]]*)?"
   "(class|struct)[[:space:]]*"
   "(\\<\\w+\\>([[:blank:]]*\\([^)]*\\))?"
   "[[:space:]]*)*(\\<\\w*\\>)[[:space:]]*"
   "(<[^;:{]+>[[:space:]]*)?(\\{|:[^;\\{()]*\\{)"); 

void IndexClasses(map_type& m, const std::string& file) 
{ 
   std::string::const_iterator start, end; 
   start = file.begin(); 
   end = file.end(); 
      boost::match_results<std::string::const_iterator> what; 
   boost::match_flag_type flags = boost::match_default; 
   while(regex_search(start, end, what, expression, flags)) 
   { 
      // what[0] contains the whole string 
      // what[5] contains the class name. 
      // what[6] contains the template specialisation if any. 
      // add class name and position to map: 
      m[std::string(what[5].first, what[5].second) 
            + std::string(what[6].first, what[6].second)] 
         = what[5].first - file.begin(); 
      // update search position: 
      start = what[0].second; 
      // update flags: 
      flags |= boost::match_prev_avail; 
      flags |= boost::match_not_bob; 
   } 
}

but, it's somewhat obfuscated (it's my first try with boost ;)) and I can't seem to find the actual location of the matching strings.

So my question is - how do I get the location of all matches?

+2  A: 

as the comments in the code suggest, what[0] contains the entire string. so what[0].first will point to the beginning of the match in every iteration of the loop. and in general to get the i'th group you could use:

string s(what[i].first, what[i].second);

to read more about the class match_results, check this link.

Idan K
I would add that string(what[i].first,what[i].second) would then give you the string for the i'th group
TK
thanks, edited and added
Idan K
A: 

Am I doing something wrong then?

I'm using it to find a "cde" pattern in "abcdefg" and getting an empty string on output.

That's the slightly modified code:

#include <iostream>
#include <string>
#include <boost/regex.hpp>
#include <map> 

using namespace std;

typedef std::map<std::string, int, std::less<std::string> > map_type;

int main() {
  string file("abcdefg");
  boost::regex expression("cde");
  map_type m;
  std::string::const_iterator start, end;

  start = file.begin(); 
   end = file.end(); 
      boost::match_results<std::string::const_iterator> what; 
   boost::match_flag_type flags = boost::match_default; 

  while(regex_search(start, end, what, expression, flags)) 
   { 

      m[std::string(what[5].first, what[5].second)+ std::string(what[6].first, what[6].second)] = what[5].first - file.begin(); 

      start = what[0].second; 

      flags |= boost::match_prev_avail; 
      flags |= boost::match_not_bob; 
   }

  string tmp(what[0].first,what[0].second);
  cout << tmp << endl;

  return 0;

}