ansaurus

Question

string analysis

Answer 1

+3 A:

If you want to get fancy, there is Boost.Regex otherwise you can use the STL replace function in combination with the strchr function..

0A0D 2010-08-20 13:21:53

Answer 2

A:

A character is represented in C/C++ by single quotes, e.g. '@', '#', etc. (except for a few that need to be escaped).

To search for a character in a string, use strchr(). Here is a link to a sample code:

http://www.cplusplus.com/reference/clibrary/cstring/strchr/

Android Eve 2010-08-20 13:23:44

C/C++ is not a language, and they're not the same thing either.

rubenvb 2010-08-20 13:40:40

A solution that works in C, works (usually) in C++ as well. The opposite is not true. The answer I provided suggests a solution that's **portable** across C *and* C++. Please avoid pettiness when the meaning of what's been written is clear.

Android Eve 2010-10-23 23:41:40

Answer 3

+2 A:

Is this C or C++? (You've tagged it both ways.)

In pure C, you pretty much have to loop through character by character and delete the unwanted ones. For example:

char *buf; 
int len = strlen(buf);
int i, j;

for (i = 0; i < len; i++)
{
    if (buf[i] == '@' || buf[i] == '#' || buf[i] == '$' /* etc */)
    {
        for (j = i; j < len; j++)
        { 
            buf[j] = buf[j+1];
        }
        i --;
    }
}

This isn't very efficient - it checks each character in turn and shuffles them all up if there's one you don't want. You have to decrement the index afterwards to make sure you check the new next character.

Vicky 2010-08-20 13:24:33

Answer 4

+3 A:

And if you, for some reason, have to do it yourself C-style, something like this would work:

char* oldstr = ... something something dark side ...

int oldstrlen = strlen(oldstr)+1;
char* newstr = new char[oldstrlen]; // allocate memory for the new nicer string
char* p = newstr; // get a pointer to the beginning of the new string

for ( int i=0; i<oldstrlen; i++ ) // iterate over the original string
    if (oldstr[i] != '@' && oldstr[i] != '#' && etc....) // check that the current character is not a bad one
      *p++ = oldstr[i]; // append it to the new string
*p = 0; // dont forget the null-termination

Jakob 2010-08-20 13:25:18

Answer 5

+2 A:

General algorithm:

Build a string that contains the characters you want purged: "@#$%"
Iterate character by character over the subject string.
Search if each character is found in the purge set.
If a character matches, discard it.
If a character doesn't match, append it to a result string.

Depending on the string library you are using, there are functions/methods that implement one or more of the above steps, such as strchr() or find() to determine if a character is in a string.

Amardeep 2010-08-20 13:26:08

Answer 6

+1 A:

use the characterizer operator, ie a would be 'a'. you haven't said whether your using C++ strings(in which case you can use the find and replace methods) or C strings in which case you'd use something like this(this is by no means the best way, but its a simple way):

void RemoveChar(char* szString, char c)
{
    while(*szString != '\0')
    {
        if(*szString == c)
            memcpy(szString,szString+1,strlen(szString+1)+1);

        szString++;
    }
}

Necrolis 2010-08-20 13:26:09

"Characterizer operator?"

Dennis Zickefoose 2010-08-20 15:09:45

ah, good catch, was thing of MSVC's preprocessor charizing operator(#@)

Necrolis 2010-08-23 10:01:35

Answer 7

+1 A:

You can use a loop and call find_last_of (http://www.cplusplus.com/reference/string/string/find_last_of/) repeatedly to find the last character that you want to replace, replace it with blank, and then continue working backwards in the string.

Mark B 2010-08-20 13:28:10

Answer 8

+11 A:

The usual standard C++ approach would be the erase/remove idiom:

#include <string>
#include <algorithm>
#include <iostream>
struct OneOf {
        std::string chars;
        OneOf(const std::string& s) : chars(s) {}
        bool operator()(char c) const {
                return chars.find_first_of(c) != std::string::npos;
        }
};
int main()
{
    std::string s = "string with @, #, $, %";
    s.erase(remove_if(s.begin(), s.end(), OneOf("@#$%")), s.end());
    std::cout << s << '\n';
}

and yes, boost offers some neat ways to write it shorter, for example using boost::erase_all_regex

#include <string>
#include <iostream>
#include <boost/algorithm/string/regex.hpp>
int main()
{
    std::string s = "string with @, #, $, %";
    erase_all_regex(s, boost::regex("[@#$%]"));
    std::cout << s << '\n';
}

Cubbi 2010-08-20 13:32:30

+1 for STL and <algorithm>

rubenvb 2010-08-20 13:39:09

Answer 9

+1 A:

Something like this would do :

bool is_bad(char c)
{
  if( c == '@' || c == '#' || c == '$' || c == '%' )
    return true;
  else
    return false;
}

int main(int argc, char **argv)
{
  string str = "a #test #@string";
  str.erase(std::remove_if(str.begin(), str.end(), is_bad), str.end() );
}

If your compiler supports lambdas (or if you can use boost), it can be made even shorter. Example using boost::lambda :

  string str = "a #test #@string";
  str.erase(std::remove_if(str.begin(), str.end(), (_1 == '@' || _1 == '#' || _1 == '$' || _1 == '%')), str.end() );

(yay two lines!)

fingerprint211b 2010-08-20 13:40:14

Answer 10

+3 A:

I think for this I'd use std::remove_copy_if:

#include <string>
#include <algorithm>
#include <iostream>

struct bad_char { 
    bool operator()(char ch) { 
        return ch == '@' || ch == '#' || ch == '$' || ch == '%';
    }
};

int main() { 
    std::string in("This@is#a$string%with@extra#stuff$to%ignore");
    std::string out;
    std::remove_copy_if(in.begin(), in.end(), std::back_inserter(out), bad_char());
    std::cout << out << "\n";
    return 0;
}

Result:

Thisisastringwithextrastufftoignore

Since the data containing these unwanted characters will normally come from a file of some sort, it's also worth considering getting rid of them as you read the data from the file instead of reading the unwanted data into a string, and then filtering it out. To do this, you could create a facet that classifies the unwanted characters as white space:

struct filter: std::ctype<char> 
{
    filter(): std::ctype<char>(get_table()) {}

    static std::ctype_base::mask const* get_table()
    {
        static std::vector<std::ctype_base::mask> 
            rc(std::ctype<char>::table_size,std::ctype_base::mask());

        rc['@'] = std::ctype_base::space;
        rc['#'] = std::ctype_base::space;
        rc['$'] = std::ctype_base::space;
        rc['%'] = std::ctype_base::space;
        return &rc[0];
    }
};

To use this, you imbue the input stream with a locale using this facet, and then read normally. For the moment I'll use an istringstream, though you'd normally use something like an istream or ifstream:

int main() { 
    std::istringstream in("This@is#a$string%with@extra#stuff$to%ignore");
    in.imbue(std::locale(std::locale(), new filter));

    std::copy(std::istream_iterator<char>(in), 
        std::istream_iterator<char>(), 
        std::ostream_iterator<char>(std::cout));

    return 0;
}

Jerry Coffin 2010-08-20 14:01:49

Your examples are making facets less frightening.. slowly.

Cubbi 2010-08-20 14:09:11

ansaurus

tags:

views:

answers:

string analysis

related questions