views:

974

answers:

3

I'm just getting my head around regular expressions, and I'm using the Boost Regex library.

I have a need to use a regex that includes a specific URL, and it chokes because obviously there are characters in the URL that are reserved for regex and need to be escaped.

Is there any function or method in the Boost library to escape a string for this kind of usage? I know there are such methods in most other regex implementations, but I don't see one in Boost.

Alternatively, is there a list of all characters that would need to be escaped?

+5  A: 
^ . $ | ( ) [ ] * + ? \ /

Ironically, you could use a regex to escape your URL so that it can be inserted into a regex.

const boost::regex esc("[\\^\\.\\$\\|\\(\\)\\[\\]\\*\\+\\?\\/\\\\]");
const std::string rep("\\\\\\1")
std::string result;
result = regex_replace(url_to_escape, esc, rep, boost::match_default | boost::format_sed);

(Waaaaaay too many backslashes... escaping escape characters tends to do that, though.)

Amber
I tried using a regex to do it, but I'm still fairly incompetent, and strange things were occuring :p I've ordered a couple of books on regex today so hopefully my ignorance will be short lived! In the meantime, using a regular string replacement to escape these characters worked for my immediate needs, thanks.
Gerald
I added some code to my answer that I *think* should work to add a backslash before any character that needs to be escaped. I haven't used boost in a while though so no guarantees.
Amber
Gerald
+2  A: 

Same with boost::xpressive:

const boost::xpressive::sregex re_escape_text = boost::xpressive::sregex::compile("([\\^\\.\\$\\|\\(\\)\\[\\]\\*\\+\\?\\/\\\\])");

std::string regex_escape(std::string text){
    text = boost::xpressive::regex_replace( text, re_escape_text, std::string("\\$1") );
    return text;
}
Roman
+2  A: 

Using code from Dav (+ a fix from comments), I created ASCII/Unicode function regex_escape():

std::wstring regex_escape(const std::wstring& string_to_escape) {
    static const boost::wregex re_boostRegexEscape( _T("[\\^\\.\\$\\|\\(\\)\\[\\]\\*\\+\\?\\/\\\\]") );
    const std::wstring rep( _T("\\\\\\1&") );
    std::wstring result = regex_replace(string_to_escape, re_boostRegexEscape, rep, boost::match_default | boost::format_sed);
    return result;
}

For ASCII version, use std::string / boost::regex instead of std::wstring / boost::wregex.

Nishi