views:

680

answers:

8

I have a string read from another source such as "\b\bfoo\bx". In this case, it would translate to the word "fox" as the first 2 \b's are ignored, and the last 'o' is erased, and then replaced with 'x'. Also another case would be "patt\b\b\b\b\b\b\b\b\b\bfoo" should be translated to "foo"

I have come up with something using String.Replace, but it is complex and I am worried it is not working correctly, also it is creating a lot of new string objects which I would like to avoid.

Any ideas?

A: 

Create a StringBuilder and copy over everything but backspace chars.

mP
I need to remove the characters from the string as well if and only if there is a corresponding backspace, not just the backspace characters.
esac
+2  A: 

You could iterate through the string backward, making a character array as you go. Every time you hit a backspace, increment a counter, and every time you hit a normal character, skip it if your counter is non-zero and decrement the counter.

I'm not sure what the best C# data structure is to manage this and then be able to get the string in the right order afterward quickly. StringBuilder has an Insert method but I don't know if it will be performant to keep inserting characters at the start or not. You could put the characters in a stack and hit ToArray() at the end -- that might or might not be faster.

mquander
+5  A: 

The way I would do it is low-tech, but easy to understand.

Create a stack of characters. Then iterate through the string from beginning to end. If the character is a normal character (non-slash), push it onto the stack. If it is a slash, and the next character is a 'b', pop the top of the stack. If the stack is empty, ignore it.

At the end, pop each character in turn, add it to a StringBuilder, and reverse the result.

Andrei Krotkov
This is cleaner than my method. +1.
mquander
That's nice, although i meant the literal escaped character '\b' so i wouldn't need to do the comparison of the next character being a 'b', but it still works. Looking at this method, the only 'problem' I have with it is that I have to do an Array.Reverse at the end of the method .. not an expensive operation, but wish I could do it without having to reverse :)
esac
You could pop them off of the stack into a character array in reversed order; i.e. char[] letters = new char[stack.Count]; for(int i = stack.Count - 1; i >= 0; i--) letters[i] = stack.Pop(); string result = new string(letters);
mquander
+8  A: 

Probably the easiest is to just iterate over the entire string. Given your inputs, the following code does the trick in 1-pass

public string ReplaceBackspace(string hasBackspace)
{
    if( string.IsNullOrEmpty(hasBackspace) )
        return hasBackspace;

    StringBuilder result = new StringBuilder(hasBackspace.Length);
    foreach (char c in hasBackspace)
    {
        if (c == '\b')
        {
            if (result.Length > 0)
                result.Length--;
        }
        else
        {
            result.Append(c);
        }
    }
    return result.ToString();
}
Robert Paulson
Simple. Straightforward. Easy to understand.
Michael Burr
I didn't know about the Length-- trick, that is neat. I was worried about sb.Remove() being expensive.
esac
Length-- is clever.
mquander
A: 
String myString = "patt\b\b\b\b\b\b\b\b\b\bfoo";
      List<char> chars = myString.ToCharArray().ToList();
      int delCount = 0;

      for (int i = chars.Count -1; i >= 0; i--)
      {
        if (chars[i] == '\b')
        {
          delCount++;
          chars.RemoveAt(i);
        } else {
          if (delCount > 0 && chars[i] != null) {
            chars.RemoveAt(i);
            delCount--;
          }
        }
      }
Markus Nigbur
A: 

i'd go like this: code is not tested

char[] result = new char[input.Length()];
int r =0;
for (i=0; i<input.Length(); i++){
if (input[i] == '\b'  && r>0) r--;
 else result[r]=input[i];

}

string resultsring = result.take(r);
Alexander Taran
+3  A: 

Regular expressions version:

var data = @"patt\b\b\b\b\b\b\b\b\b\bfoo";
var regex = new Regex(@"(^|[^\\b])\\b");

while (regex.IsMatch(data))
{
    data = regex.Replace(data, "");
}


Optimized version (and this one works with backspace '\b' and not with string "\b"):

var data = "patt\b\b\b\b\b\b\b\b\b\bfoo";
var regex = new Regex(@"[^\x08]\x08", RegexOptions.Compiled);

while (data.Contains('\b'))
{
    data = regex.Replace(data.TrimStart('\b'), "");
}
Kamarey
+2  A: 
public static string ProcessBackspaces(string source)
{
    char[] buffer = new char[source.Length];
    int idx = 0;

    foreach (char c in source)
    {
        if (c != '\b')
        {
            buffer[idx] = c;
            idx++;
        }
        else if (idx > 0)
        {
            idx--;
        }
    }

    return new string(buffer, 0, idx);
}

EDIT

I've done a quick, rough benchmark of the code posted in answers so far (processing the two example strings from the question, one million times each):

 ANSWER                 | TIME (ms)
------------------------|-----------
 Luke (this one)        |       318
 Alexander Taran        |       567
 Robert Paulson         |       683
 Markus Nigbur          |      2100
 Kamarey (new version)  |      7075
 Kamarey (old version)  |     30902
LukeH
Your code is fast, but slightly incorrect. It fails for the test case 'fox\b\b\b\bfor' which should yield "for" (thank goodness for unit tests :)) because idx is = 0 on the last \b, so it puts it into the char buffer. Here is the fixed portion: if (c == '\b') { if (idx > 0) { idx--; } } else { buffer[idx] = c; idx++; }
esac
Robert Paulson
@esac, @Robert, Well spotted! I've updated to fix that bug.
LukeH
Please update your benchmark with my newer version. When testing, move the creation of regex out of the test loop, it needless to create it each time.
Kamarey
@Kamarey, I've added your new version to the table. (I'm on a different machine this morning, so I've had to re-run and update all the benchmarks to keep everything consistent.)
LukeH