views:

2074

answers:

4

I'm trying to change user input in wildcard form ("*word*") to a regular expression format. To that end, I'm using the code below to strip off the '*' at the beginning and end of the input so that I can add the regular expression characters on either end:

string::iterator    iter_begin = expressionBuilder.begin();
string::iterator    iter_end = expressionBuilder.end();
iter_end--;
if ((char)*iter_begin == '*' && (char)*iter_end == '*')
{
    expressionBuilder.erase(iter_begin);
    expressionBuilder.erase(iter_end);
    expressionBuilder = "\\b\\w*" + expressionBuilder + "\\w*\\b";
}

However, the call to "expressionBuilder.erase(iter_end)" does not erase the trailing '*' from the input string so I wind up with an incorrect regular expression. What am I doing wrong here? "(char)*iter_end == '*'" must be true for the code inside the if statment to run (which it does), so why doesn't the same iterator work when passed to erase()?

+3  A: 

Try erasing them in the opposite order:

expressionBuilder.erase(iter_end);
expressionBuilder.erase(iter_begin);

After erasing the first *, iter_end refers to one character past the end of the string in your example. The STL documentation indicates that iterators are invalidated by erase(), so technically my example is wrong too but I believe it will work in practice.

Greg Hewgill
Luckily with strings you don't need to use iterators, most functions have a form that takes an index instead. Still, like you say, even with indexed erasure it should still be done "back to front".
Chris Jester-Young
P4tXrx5jrMlbhyludk9pxHBT30kGHo9n: you're right about end(), but there is an iter_end-- in there that looks at the actual last character of the string.
Greg Hewgill
This makes perfect sense, and reversing the order did solve the problem. Thanks!
jeffm
Sorry, I missed the line with "--".
Max Lybbert
@Greg: some other post concluded that iterators _after_ the erased one are invalidated. (http://stackoverflow.com/questions/62340/is-a-popback-in-a-stdvector-really-invalidates-all-iterators-on-the-vector)
xtofl
xtofl: good to know, thanks.
Greg Hewgill
@xtoff - this doesn't apply to string. In order to allow support for reference counted strings, calling any non-const member function (except a few that do nothing but return references or iterators) may invalidate any existing iterators.
Michael Burr
+1  A: 

(Revised, as I missed the iter_end-- line).

You probably want an if statement that only checks if *iter_begin == '*', and then calls find() to get the other '*'. Or you could use rbegin() to get the "beginning iterator of of the sequence in reverse," advance it one and then call base() to turn it to a regular iterator. That will get you the last character in the sequence.


Even better, std::string has rfind() and find_last_of() methods. They will get you the last '*'. You can also simply call replace() instead of stripping out the '*'s and then adding the new stuff back in.

Max Lybbert
Notice there is an iter_end-- in there that backs up one character.
Greg Hewgill
Did you miss the "iter_end--;" line, which moves the iterator back to the last item? I'm sure Greg's answer is right, because string iterators are basically just indexes, so the end index is invalidated by the first erase.
Roddy
I was trying to avoid "find_last_of" because I already know where the character is, but maybe I was overthinking it.
jeffm
I missed the "--" line, thanks.
Max Lybbert
+7  A: 

Your original code and the proposed solutions so far have a couple of problems in addition to the obvious problem you posted about:

  • use of invalidated iterators after the string is modified
  • dereferencing possibly invalid iterators even before the string is modified (if the string is empty, for example)
  • a bug if the expressionBuilder string contains only a single '*' character

Now, the last two items might not really be a problem if the code that uses the snippet/routine is already validating that the string has at least 2 characters, but in case that's not the situation, I believe the following to be more robust in the face of arbitrary values for expressionBuilder:

// using the reverse iterator rbegin() is a nice easy way 
//     to get the last character of a string

if ( (expressionBuilder.size() >= 2) &&
    (*expressionBuilder.begin()  == '*') &&
    (*expressionBuilder.rbegin() == '*') ) {

    expressionBuilder.erase(expressionBuilder.begin());

    // can't nicely use rbegin() here because erase() wont take a reverse
    //  iterator, and converting reverse iterators to regular iterators
    //  results in rather ugly, non-intuitive code
    expressionBuilder.erase(expressionBuilder.end() - 1); // note - not invalid since we're getting it anew

    expressionBuilder = "\\b\\w*" + expressionBuilder + "\\w*\\b";
}

Note that this code will work when expressionBuilder is "", "*", or "**" in that it does not perform any undefined actions. However, it might not produce the results you want in those cases (that's because I don't know exactly what you do want in those cases). Modify to suit your needs.

Michael Burr
Thanks. I pretty much know at this point that the string is not empty or "*", but I agree that it would be better to code it that way just in case something changes later.
jeffm
very nice - just used this in some of my code too
danio
A: 

Minus the error handling, you could probably just do it like this:

#include <iostream>
#include <string>
using namespace std;

string stripStar(const string& s) {
    return string(s.begin() + 1, s.end() - 1);
}

int main() {
   cout << stripStar("*word*") << "\n";
}
Shadow2531