views:

219

answers:

2

I'm using the .Net micro framework so the StringBuilder is not available.

I have seen some code from apt professionals to use an Arraylist of chars to concat and build strings, as opposed to the + operator. They essentially build a managed code StringBuilder.

Is there a performance advantage to this? Assume the number of concatenations are higher than 10 and string lengths are also larger than 10.

+11  A: 

No, don't use an ArrayList of char values. That will box every char - performance will be horrible, as will memory usage. (Size of a reference + size of a boxed char for each character... yikes!)

Use a char[] internally and "resize" it (create a new array and copy the contents in) when you need to, perhaps doubling in size each time. (EDIT: You don't resize it to the exact size you need - you would start off with, say, 16 chars and keep doubling - so most Append operations don't need to "resize" the array.)

That's similar to how StringBuilder works anyway. (It's even closer to how Java's StringBuilder works.)

I suggest you actually build your own StringBuilder type with the most important members. Unit test the heck out of it, and profile where appropriate.

Let me know if you want a short example.

Jon Skeet
Wouldn't a List<T> more or less cover the resizing pro bono? I was under the impression it was just a vector under the hood.
mquander
Is there a reason why you suggested a char[] instead of list<char>, or were you assuming that he probably is unable to use them?
Kevin
For one thing, you can construct a string directly from a `char[]`. For another, I'm assuming that a framework which doesn't even have `StringBuilder` is unlikely to have generics... the micro framework doesn't, for example.
Jon Skeet
I'm completely not sure, so this is actually for my benefit (clarification), but why is it faster to create a char[], create a new char[] and copy the previous contents to it, than it would to create a string, and have it copy to a new string?
Sivvy
@Sivvy: Because you wouldn't create a new char[] each time. You'd keep one with a certain amount of "slack", and only create a new one when you had to.
Jon Skeet
Thanks! I've always been a little confused on that.
Sivvy
Ahh I didn't think about that. Thanks!
Kevin
mquander, if StringBuilder is unavailable, then generics are probably off the table as well. I'm guessing this is the microFx.
Henk Holterman
Jon's right, generics aren't available either (sorry, it's the second time I get caught thinking my question has nothing to do with generics).
MandoMando
Thanks you! My plan was/is to use the re-doubling storage scheme (like a hashtable impl.) to build my own StringBuilder. Judging by the responses then, the + operator is a total minus when it comes to this then.
MandoMando
+1  A: 

The only reason that using an ArrayList of chars to build a string would be considered performant is if you compare it to something that has really bad performance. Concatenating a huge string using += would be an example of something that would have such bad performance.

You can make the string concatenation a lot more efficient if you just concatenate into several shorter strings instead of one large string.

This code, for example:

string[] parts = new string[1000];
for (int i = 0; i < parts.Length; i++) {
  string part = String.Empty;
  for (int j=0; j < 100; j++) {
    part += "*";
  }
  parts[i] = part;
}
string result = String.Concat(parts);

Is about 450 times faster than this code:

string result = string.Empty;
for (int i = 0; i < 100000; i++) {
  result += "*";
}

A StringBuilder is still faster, but it's only about four times faster than the first example. So by using shorter strings you can cut the time by 99.78%, and using a StringBuilder would only cut another 0.16%.

Guffa
Yes. If I'm not mistaken, every concat destroys the old objects and creates a new one. At the tail end of the 100000 loop, we'd be throwing around a lot of data and with smaller strings, we wouldn't.
MandoMando
@MandoMando: Yes, the first example will be moving 10 MB of data to create the 0.2 MB string, while the second example will be moving 10 GB (!) of data to create the same string.
Guffa