ansaurus

Question

Answer 1

+8 A:

You need to do a resize instead at the end.

Contrary to popular belief an std::string CAN contain binary data including 0s. An std::string 's .size() is not related to the string containing a NULL termination.

std::string s("\0\0", 2);
assert(s.size() == 2);

Brian R. Bondy 2010-04-21 15:59:41

Answer 2

+9 A:

Let the standard library do the work for you:

#include <algorithm>

str1.erase(std::remove(str1.begin(), str1.end(), ','), str1.end());

If you don't want to modify the original string, that's easy too:

std::string str2(str1.size(), '0');
str2.erase(std::remove_copy(str1.begin(), str1.end(), str2.begin(), ','), str2.end());

Fred Larson 2010-04-21 16:01:16

`remove_copy` should be much faster than anything that uses `erase`.

Ben Voigt 2010-04-21 16:17:19

@Ben Voigt: Possibly, but I didn't have that right. I'll fix it.

Fred Larson 2010-04-21 16:18:34

I still needed `erase` with `remove_copy` unless I used `back_inserter`. But I suspect `erase` is faster than having to grow the string multiple times.

Fred Larson 2010-04-21 16:29:23

Depends on the number of erasures. If you are removing M elements from a list of N items, `erase` is O(MN), while `back_inserter` is O(N) with a slightly greater constant term. And calling `reserve` before `back_inserter` gets you the best of both worlds.

Ben Voigt 2010-04-21 17:01:50

Sorry, it's `remove`, not `erase`, that ends up moving items around. But just calling `resize` should be faster than `erase`.

Ben Voigt 2010-04-21 17:06:00

Answer 3

+4 A:

The answer is probably that std::strings aren't NUL-terminated. Instead of setting the end+1'th character to '\0', you should use str.resize(new_length);.

Edit: Also consider that, if your source string has no commas in it, then your '\0' will be written one past the end of the string (which will probably just happen to work, but is incorrect).

Kaz Dragon 2010-04-21 16:03:09

Answer 4

+1 A:

The std::srting does not terminate with \0, you are mixing this with char* in C. So you should use resize.

Draco Ater 2010-04-21 16:06:21

Answer 5

+1 A:

The solution has already been posted by Fred L.

In a "procedural fashion" (without "algorithm") your program would look like:

 void removeStuff(string& str, char character)
{
 size_t pos;
 while( (pos=str.find(character)) != string::npos )
      str.erase(pos, 1);
}

 void main()
{
 string str1;
 getline(cin, str1);
 removeStuff(str1, ',');
 cout<<"the new string "<<str1<<endl;
}

then.

Regards

rbo

EDIT / Addendum:

In order to adress some efficiency concerns of readers, I tried to come up with the fastest solution possible. Of course, this should kick in on string sizes over about 10^5 characters with some characters to-be-removed included:

 void fastRemoveStuff(string& str, char character)
{
 size_t len = str.length();
 char *t, *buffer = new char[len];
 const char *p, *q;

 t = buffer, p = q = str.data();
 while( p=(const char*)memchr(q, character, len-(p-q)) ) {
     memcpy(t, q, p-q);
     t += p-q, q = p+1;
 }
 if( q-str.data() != len ) {
     size_t tail = len - (q-str.data());
     memcpy(t, q, tail);
     t += tail;
 }
 str.assign(buffer, t-buffer);
 delete [] buffer;
}

 void main()
{
 string str1 = "56,4,44,55,5,55"; // should be large, 10^6 is good
 // getline(cin, str1);
 cout<<"the old string " << str1 << endl;
 fastRemoveStuff(str1, ',');
 cout<<"the new string " << str1 << endl;
}

Regards

rbo

rubber boots 2010-04-21 16:13:51

This is much less efficient than the original solution.

Ben Voigt 2010-04-21 16:15:54

@Ben: What does it matter how efficient the original solution was? It didn't produce correct results. I can make very efficient algorithms too if I don't have to care about getting the answer right.

Bill 2010-04-21 16:21:51

The original algorithm is easily fixed by setting the end of the string with `resize` instead of appending NUL, and retains its good performance.

Ben Voigt 2010-04-21 17:03:11

Your new code is quite nice, taking advantage of memcpy and all its optimizations. I think you can get rid of the temporary buffer by using memmove, but even that has questionable benefit as memmove may not be as well tuned as memcpy.

Ben Voigt 2010-04-21 20:35:14

@Ben, from my investigations - it appears to me that library architects have, at least in gcc variants and Visual C++, included some considerable optimization into memchr(). From the Visual C++(9) disassembly (iirc) they even aligned memory accesses to 16 byte (or so) boundaries. So the major 'trick' here seems to be the inconspicuous use of memchr() ;-)

rubber boots 2010-04-21 21:38:47

Answer 6

A:

My own procedural version:

#include <string>
#include <cassert>

using namespace std;

string Remove( const string & s, char c  ) {

    string r;
    r.reserve( s.size() );

    for ( unsigned int i = 0; i < s.size(); i++ ) {
        if ( s[i] != c ) {
            r += s[i];
        }
    }

    return r;
}

int main() {
    assert( Remove( "Foo,Bar,Zod", ',' ) == "FooBarZod" );
}

anon 2010-04-21 16:31:19

ansaurus

tags:

views:

answers:

string in c++,question

related questions