views:

194

answers:

6

Hi, I created a program in C++ that remove commas (') from a given integer. i.e. 2,00,00 would return 20000. I am not using any new space. Here is the program i created

void removeCommas(string& str1,int len)
{
int j=0;
for(int i=0;i<len;i++)
{
    if(str1[i] == ',')
        continue;
    else
    {

        str1[j] =str1[i];
        j++;

    }

}
str1[j] = '\0';

}

void main()
{

string str1;
getline(cin,str1);
int i = str1.length();
removeCommas(str1,i);
cout<<"the new string "<<str1<<endl;


}

Here is the result i get : Input : 2,000,00 String length =8 Output = 200000 0 Length = 8

My question is that why does it show the length has 8 in output and shows the rest of string when i did put a null character. It should show output as 200000 and length has 6.

+8  A: 

You need to do a resize instead at the end.

Contrary to popular belief an std::string CAN contain binary data including 0s. An std::string 's .size() is not related to the string containing a NULL termination.

std::string s("\0\0", 2);
assert(s.size() == 2);
Brian R. Bondy
+9  A: 

Let the standard library do the work for you:

#include <algorithm>

str1.erase(std::remove(str1.begin(), str1.end(), ','), str1.end());

If you don't want to modify the original string, that's easy too:

std::string str2(str1.size(), '0');
str2.erase(std::remove_copy(str1.begin(), str1.end(), str2.begin(), ','), str2.end());
Fred Larson
`remove_copy` should be much faster than anything that uses `erase`.
Ben Voigt
@Ben Voigt: Possibly, but I didn't have that right. I'll fix it.
Fred Larson
I still needed `erase` with `remove_copy` unless I used `back_inserter`. But I suspect `erase` is faster than having to grow the string multiple times.
Fred Larson
Depends on the number of erasures. If you are removing M elements from a list of N items, `erase` is O(MN), while `back_inserter` is O(N) with a slightly greater constant term. And calling `reserve` before `back_inserter` gets you the best of both worlds.
Ben Voigt
Sorry, it's `remove`, not `erase`, that ends up moving items around. But just calling `resize` should be faster than `erase`.
Ben Voigt
+4  A: 

The answer is probably that std::strings aren't NUL-terminated. Instead of setting the end+1'th character to '\0', you should use str.resize(new_length);.

Edit: Also consider that, if your source string has no commas in it, then your '\0' will be written one past the end of the string (which will probably just happen to work, but is incorrect).

Kaz Dragon
+1  A: 

The std::srting does not terminate with \0, you are mixing this with char* in C. So you should use resize.

Draco Ater
+1  A: 

The solution has already been posted by Fred L.

In a "procedural fashion" (without "algorithm") your program would look like:

 void removeStuff(string& str, char character)
{
 size_t pos;
 while( (pos=str.find(character)) != string::npos )
      str.erase(pos, 1);
}

 void main()
{
 string str1;
 getline(cin, str1);
 removeStuff(str1, ',');
 cout<<"the new string "<<str1<<endl;
}

then.

Regards

rbo

EDIT / Addendum:

In order to adress some efficiency concerns of readers, I tried to come up with the fastest solution possible. Of course, this should kick in on string sizes over about 10^5 characters with some characters to-be-removed included:

 void fastRemoveStuff(string& str, char character)
{
 size_t len = str.length();
 char *t, *buffer = new char[len];
 const char *p, *q;

 t = buffer, p = q = str.data();
 while( p=(const char*)memchr(q, character, len-(p-q)) ) {
     memcpy(t, q, p-q);
     t += p-q, q = p+1;
 }
 if( q-str.data() != len ) {
     size_t tail = len - (q-str.data());
     memcpy(t, q, tail);
     t += tail;
 }
 str.assign(buffer, t-buffer);
 delete [] buffer;
}

 void main()
{
 string str1 = "56,4,44,55,5,55"; // should be large, 10^6 is good
 // getline(cin, str1);
 cout<<"the old string " << str1 << endl;
 fastRemoveStuff(str1, ',');
 cout<<"the new string " << str1 << endl;
}

Regards

rbo

rubber boots
This is much less efficient than the original solution.
Ben Voigt
@Ben: What does it matter how efficient the original solution was? It didn't produce correct results. I can make very efficient algorithms too if I don't have to care about getting the answer right.
Bill
The original algorithm is easily fixed by setting the end of the string with `resize` instead of appending NUL, and retains its good performance.
Ben Voigt
Your new code is quite nice, taking advantage of memcpy and all its optimizations. I think you can get rid of the temporary buffer by using memmove, but even that has questionable benefit as memmove may not be as well tuned as memcpy.
Ben Voigt
@Ben, from my investigations - it appears to me that library architects have, at least in gcc variants and Visual C++, included some considerable optimization into memchr(). From the Visual C++(9) disassembly (iirc) they even aligned memory accesses to 16 byte (or so) boundaries. So the major 'trick' here seems to be the inconspicuous use of memchr() ;-)
rubber boots
A: 

My own procedural version:

#include <string>
#include <cassert>

using namespace std;

string Remove( const string & s, char c  ) {

    string r;
    r.reserve( s.size() );

    for ( unsigned int i = 0; i < s.size(); i++ ) {
        if ( s[i] != c ) {
            r += s[i];
        }
    }

    return r;
}

int main() {
    assert( Remove( "Foo,Bar,Zod", ',' ) == "FooBarZod" );
}
anon