tags:

views:

84

answers:

2

Hello,

This seems like such a basic question, so I apologize if it's already been answered somewhere (my searching didn't turn up anything).

I just want to filter a string object so that it contains only alphanumeric and space characters.

Here's what I tried:

#include "boost/algorithm/string/erase.hpp"
#include "boost/algorithm/string/classification.hpp"

std::wstring oldStr = "Bla=bla =&*\nSampleSampleSample ";
std::wstring newStr = boost::erase_all_copy(oldStr, !(boost::is_alnum() || 
                                                      boost::is_space()));

But the compiler is not at all happy with that -- it seems that I can only put a string in the second argument of erase_all_copy and not this is_alnum() stuff.

Is there some obvious solution I'm missing here?

A: 

It's been years since I've used boost, but perhaps you could use erase_all_regex_copy() instead of erase_all_copy()? It might be a bit of a performance hit, but it may be your only choice aside from iterating over each element and checking manually. If you're not familiar with regular expressions, the expression you'd use in this case would be something like "[^a-zA-Z0-9 ]+".

For completeness' sake, some sample code:

#include "boost/regex.hpp"
#include "boost/algorithm/string/regex.hpp"

std::wstring oldStr = "Bla=bla =&*\nSampleSampleSample ";
std::wstring newStr = boost::erase_all_regex_copy(oldStr, boost::regex("[^a-zA-Z0-9 ]+"));
Faisal
I'm getting errors complaining about char_t and wchar_t conversion. Perhaps this regex is implicitly assuming char_t instead of wchar_t? I put an L in front of the regex string, but it didn't like that either.The beginning of the long long error message:C:\boost_1_40_0\boost/regex/v4/perl_matcher_common.hpp(802) : warning C4244: 'argument' : conversion from 'const wchar_t' to 'char', possible loss of data
jjiffer
Try changing boost::regex("etc.") to boost::regex<wchar_t>("etc."), perhaps? Then again, Eric up there has a good solution that probably won't give you the same trouble. :) (Also, I'd test this code myself, but I don't have boost installed on my dev machine...building/installing it now.)
Faisal
A: 

With the std algorithms and Boost.Bind:

std::wstring s = ...
std::wstring new_s;
std::locale loc;
std::remove_copy_if(s.begin(), s.end(), std::back_inserter(new_s), 
    !(boost::bind(&std::isalnum<wchar_t>, _1, loc)||
      boost::bind(&std::isspace<wchar_t>, _1, loc)
));
Éric Malenfant
That worked, thank you! At this point, it all just looks like black magic to me, but it'll give me a place to start understanding iterators and the C++ way of things better.
jjiffer
@jjiffer: remove_copy_if takes an input range (the "s.begin(), s.end()" part) and an output iterator to which it writes the extracted characters (the "back_inserter(new_s)" part). The fourth argument is a function object that takes an element as input (in this case a wchar_t) and returns bool. If this function returns true, the element is skipped.(to be continued....)
Éric Malenfant
Éric Malenfant
Ah, I see. I had no clue C++ had anything like bind. Thanks for explanation!
jjiffer