views:

1169

answers:

7

I had an interview question that asked me for my 'feedback' on a piece of code a junior programmer wrote. They hinted there may be a problem and said it will be used heavily on large strings.

public string ReverseString(string sz)
{
    string result = string.Empty;
    for(int i = sz.Length-1; i>=0; i--)
    {
      result += sz[i]
    }
    return result;
}

I couldn't spot it. I saw no problems whatsoever. In hindsight I could have said the user should resize but it looks like C# doesn't have a resize (i am a C++ guy).

I ended up writing things like use an iterator if its possible, [x] in containers could not be random access so it may be slow. and misc things. But I definitely said I never had to optimize C# code so my thinking may have not failed me on the interview.

I wanted to know, what is the problem with this code, do you guys see it?

-edit-

I changed this into a wiki because there can be several right answers. Also i am so glad i explicitly said i never had to optimize a C# program and mentioned the misc other things. Oops. I always thought C# didnt have any performance problems with these type of things. oops.

+7  A: 

Since strings are immutable, each += statement will create a new string by copying the string in the last step, along with the single character to form a new string. Effectively, this will be an O(n2) algorithm instead of O(n).

A faster way would be (O(n)):

// pseudocode:
static string ReverseString(string input) {
    char[] buf = new char[input.Length];
    for(int i = 0; i < buf.Length; ++i)
       buf[i] = input[input.Length - i - 1];
    return new string(buf);
}
Mehrdad Afshari
n² will be especially significant on "large strings".
Dour High Arch
This is the most common .NET gotcha that I've seen. string allocation can be a bottleneck because the temp strings can hamper GC performance. It's a particularly good interview question to test .NET experience vs "I'm a C++ programmer who read a C# book last week"
Jimmy
As a side note, a generational GC (like .NET GC) is pretty good at allocating and deallocating short lived object.
Mehrdad Afshari
+34  A: 

Most importantly? That will suck performance wise - it has to create lots of strings (one per character). The simplest way is something like:

public static string Reverse(string sz) // ideal for an extension method
{
    if (string.IsNullOrEmpty(sz) || sz.Length == 1) return sz;
    char[] chars = sz.ToCharArray();
    Array.Reverse(chars);
    return new string(chars);
}
Marc Gravell
+1 - see my answer for why.
Jon Skeet
+23  A: 

The problem is that string concatenations are expensive to do as strings are immutable in C#. The example given will create a new string one character longer each iteration which is very inefficient. To avoid this you should use the StringBuilder class instead like so:

public string ReverseString(string sz)
{
    var builder = new StringBuilder(sz.Length);
    for(int i = sz.Length-1; i>=0; i--)
    {
      builder.Append(sz[i]);
    }
    return builder.ToString();
}

The StringBuilder is written specifically for scenarios like this as it gives you the ability to concatenate strings without the drawback of excessive memory allocation.

You will notice I have provided the StringBuilder with an initial capacity which you don't often see. As you know the length of the result to begin with, this removes needless memory allocations.

What normally happens is it allocates an amount of memory to the StringBuilder (default 16 characters). Once the contents attempts to exceed that capacity it doubles (I think) its own capactity and carries on. This is much better than allocating memory each time as would happen with normal strings, but if you can avoid this as well it's even better.

Garry Shutler
Not being funny but how can someone down vote this answer?
Garry Shutler
That may sound smug but my recent activity showed a down vote and I was just confused.
Garry Shutler
I have nothing to do with it but consider if the person accidentally hit down then hit up. Logically it shouldnt show in recent activities but it is possible. I only thought of this because when i first got on this site (5mo ago) I tested up then down then up voting. Just to see if i was able to do it.
acidzombie24
Garry: Get used to it. Many times people downvote correct answers without commenting.
Mehrdad Afshari
+1  A: 

Better way to tackle it would be to use a StringBuilder, since it is not immutable you won't get the terrible object generation behavior that you would get above. In .net all strings are immutable, which means that the += operator there will create a new object each time it is hit. StringBuilder uses an internal buffer, so the reversal could be done in the buffer w/ no extra object allocations.

chris.w.mclean
ahh, += makes a new object! That is insane. I always thought the '=' forces this to be an internal operations. Why is string allowed to update itself to point to a new string!?!
acidzombie24
String isn't allowed to update itself - a string *variable* is allowed to be reassigned to refer to a different string though.
Jon Skeet
+1  A: 

You should use the StringBuilder class to create your resulting string. A string is immutable so when you append a string in each interation of the loop, a new string has to be created, which isn't very efficient.

Charlie
Don't immediately rush to StringBuilder automatically any time there's a string issue. There may be other, simpler solutions - Marc's code is nice and elegant.
Jon Skeet
+13  A: 

A few comments on the answers given so far:

  • Every single one of them (so far!) will fail on surrogate pairs and combining characters. Oh the joys of Unicode. Reversing a string isn't the same as reversing a sequence of chars.
  • I like Marc's optimisation for null, empty, and single character inputs. In particular, not only does this get the right answer quickly, but it also handles null (which none of the other answers do)
  • I originally thought that ToCharArray followed by Array.Reverse would be the fastest, but it does create one "garbage" copy.
  • The StringBuilder solution creates a single string (not char array) and manipulates that until you call ToString. There's no extra copying involved... but there's a lot more work maintaining lengths etc.

Which is the more efficient solution? Well, I'd have to benchmark it to have any idea at all - but even so that's not going to tell the whole story. Are you using this in a situation with high memory pressure, where extra garbage is a real pain? How fast is your memory vs your CPU, etc?

As ever, readability is usually king - and it doesn't get much better than Marc's answer on that front. In particular, there's no room for an off-by-one error, whereas I'd have to actually put some thought into validating the other answers. I don't like thinking. It hurts my brain, so I try not to do it very often. Using the built-in Array.Reverse sounds much better to me. (Okay, so it still fails on surrogates etc, but hey...)

Jon Skeet
If I ever write a language, I'm going to implement string.Reverse() just to avoid silly interview questions like this!
Garry Shutler
If you did that, they'd have to think up even sillier questions to ask people.
Matt Boehm
+2  A: 

You can do this in .NET 3.5 instead:

    public static string Reverse(this string s)
    {
        return new String((s.ToCharArray().Reverse()).ToArray());
    }
jasonh
Did you try compiling it?
Jon Skeet
(Even if it worked, it wouldn't be ideal. Enumerable.Reverse() has to build up a buffer of elements, which needs to be resized periodically. There's then the matter of iterating over it, etc. Using Array.Reverse is much more efficient. Yes, it takes a couple more lines of code - but it's better, IMO.)
Jon Skeet
Did you call ToArray on the result of Reverse, perhaps? return new String(s.ToCharArray().Reverse().ToArray());
Jon Skeet
I figured out my error and corrected it. Thanks!
jasonh