tags:

views:

172

answers:

5

Hello everybody,

I'm trying to get all words inside a string using Boost::regex in C++.

Here's my input :

"Hello there | network - bla bla hoho"

using this code :

      regex rgx("[a-z]+",boost::regex::perl|boost::regex::icase);

      regex_search(input, result, rgx);

       for(unsigned int j=0; j<result.size(); ++j)
       {
         cout << result[j] << endl;
       }

I only get the first word "Hello".. whats wrong with my code ? result.size() returns 1.

thank you.

A: 

You're only searching for alphabetic characters, not spaces, pipes or hyphens. regex_search() probably just returns the first match.

Skilldrick
A: 

You would need to capture any set of "[a-z]+" (or some other regex for matching "words") bound by spaces or string boundaries. You could try something like this:

^(\s*.+\s*)+$

In any event, this isn't really a boost::regex problem, it's just a regex problem. use perl or the bash shell (or any number of web tools) to get your regex figured out, then use in your code.

Ben Collins
A: 

Perhaps you could try using repeated captures with the following regex "(?:([a-z]+)\\b\\s*)+".

Marcelo Cantos
+5  A: 

regex_search only finds the first match. To iterate over all matches, use regex_iterator

Éric Malenfant
A: 

To match words, try this regex:

regex rgx("\\<[a-z]+\\>",boost::regex::perl|boost::regex::icase);

According to the docs, \< denotes the start of a word and \> denotes the end of a word in the Perl variety of Boost regex matching.

I'm afraid someone else has to explain how to iterate the matches. The Boost documentation makes my brain hurt.

Tomalak
Agreed that the Boost.Regex documentation was fairly bad.
Yacoby
ahum, it still is...
ufotds