ansaurus

Question

Using regular expressions with C++ on Unix

Answer 1

A:

You are looking for regcomp, regexec and regfree.

One thing to be careful about is that the Posix regular expressions actually implement two different languages, regular (default) and extended (include the flag REG_EXTENDED in the call to regcomp). If you are coming from the PHP world, the extended language closer to what you are used to.

R Samuel Klatchko 2010-02-08 20:48:23

same comment as @epatel

Stanislav Palatnik 2010-02-08 21:14:15

Answer 2

+5 A:

Look up the documentation for TR1 regexes or (almost equivalently) boost regex. Both work quite nicely on various Unix systems. The TR1 regex classes have been accepted into C++ 0x, so though they're not exactly part of the standard yet, they will be reasonably soon.

Edit: To break a string into subgroups, you can use an sregex_token_iterator. You can specify either what you want matched as tokens, or what you want matched as separators. Here's a quickie demo of both:

#include <iterator>
#include <regex>
#include <string>
#include <iostream>

int main() { 

    std::string line;

    std::cout << "Please enter some words: " << std::flush;
    std::getline(std::cin, line);

    std::tr1::regex r("[ .,:;\\t\\n]+");
    std::tr1::regex w("[A-Za-z]+");

    std::cout << "Matching words:\n";
    std::copy(std::tr1::sregex_token_iterator(line.begin(), line.end(), w),
        std::tr1::sregex_token_iterator(), 
        std::ostream_iterator<std::string>(std::cout, "\n"));

    std::cout << "\nMatching separators:\n";
    std::copy(std::tr1::sregex_token_iterator(line.begin(), line.end(), r, -1), 
        std::tr1::sregex_token_iterator(), 
        std::ostream_iterator<std::string>(std::cout, "\n"));

    return 0;
}

If you give it input like this: "This is some 999 text", the result is like this:

Matching words:
This
is
some
text

Matching separators:
This
is
some
999
text

Jerry Coffin 2010-02-08 20:49:13

He can also use Boost Xpressive (http://www.boost.org/doc/libs/1_42_0/doc/html/xpressive.html) which will get him compile-time error checking of his regular expressions. I doubt that will ever become standard though :)

Manuel 2010-02-08 21:03:57

This one is the most ideal imo. But I actually ran into it before and the server that I need to deply to doesn't support this. :/

Stanislav Palatnik 2010-02-08 21:05:09

@Manuel: Comment markdown syntax sucks sometimes, doesn't it? Also you're using 1.38?! Use `/release/` in boost URLs for the latest release version.

Roger Pate 2010-02-08 21:05:45

@Roger - Thanks, fixed. Strangely, the link to 1.38 was the first result on Google.

Manuel 2010-02-08 21:08:36

What I got out of regcomp, regexec is that they return 0 is its found. I need to return all the subgroups also.

Stanislav Palatnik 2010-02-08 21:13:18

@Manuel: Yes, Xpressive may be able to do the job. Depending on the details of what he wants, Boost Spririt::lex might also. Unless his needs are fairly specialized, however, the normal RE package is probably the first choice.

Jerry Coffin 2010-02-08 21:14:47

@Jerry Coffin - agreed, I just thought that Xpressive deserved at least a mention :)

Manuel 2010-02-08 21:19:17

@Manuel: I tend to agree -- I probably should have mentioned at least a few more of the (admittedly many) possibilities.

Jerry Coffin 2010-02-08 21:57:33

Answer 3

A:

For perl-compatible regular expressions (pcre/preg), I'd suggest boost.regex.

Nicolás 2010-02-08 20:49:19

Answer 4

A:

My best bet would be boost::regex.

Nikolai N Fetissov 2010-02-08 20:49:51

Answer 5

A:

Try pcre. And pcrepp.

Michael Krelin - hacker 2010-02-08 20:50:02

Answer 6

+10 A:

Consider using Boost.Regex.

An example (from the website):

bool validate_card_format(const std::string& s)
{
   static const boost::regex e("(\\d{4}[- ]){3}\\d{4}");
   return regex_match(s, e);
}

Another example:

// match any format with the regular expression:
const boost::regex e("\\A(\\d{3,4})[- ]?(\\d{4})[- ]?(\\d{4})[- ]?(\\d{4})\\z");
const std::string machine_format("\\1\\2\\3\\4");
const std::string human_format("\\1-\\2-\\3-\\4");

std::string machine_readable_card_number(const std::string s)
{
   return regex_replace(s, e, machine_format, boost::match_default | boost::format_sed);
}

std::string human_readable_card_number(const std::string s)
{
   return regex_replace(s, e, human_format, boost::match_default | boost::format_sed);
}

0xfe 2010-02-08 20:51:23

Answer 7

A:

Feel free to have a look at this small color grep tool I wrote.

At github

It uses regcomp, regexec and regfree that R Samuel Klatchko refers to.

epatel 2010-02-08 20:53:15

Do you have any examples of returning the subgroups and manipulating them?

Stanislav Palatnik 2010-02-08 21:13:55

@Stanislav Palatnik Think that is handled on (around) line 95

epatel 2010-02-08 21:38:21

Answer 8

A:

I use "GNU regex": http://www.gnu.org/s/libc/manual/html_node/Regular-Expressions.html

Works well but can't find clear solution for UTF-8 regexp.

Regards

opal 2010-02-08 21:48:42

ansaurus

tags:

views:

answers:

Using regular expressions with C++ on Unix

related questions