views:

527

answers:

10

Hi, How can I optimize the following code so that it executes faster:

static void Main(string[] args) 
{
    String a = "Hello ";
    String b = " World! ";
    for (int i=0; i<20000; i++) 
    {
        a = a + b;
    }
    Console.WriteLine(a);
}
+3  A: 

Since its output is predetermined, it would run faster if you just hardcoded the literal value that is built by the loop.

JoshJordan
Hello world! world! world! world! world! .. gets repetitive after the first 5, let alone the other 19995
Robert Paulson
Make a Python/Perl/<your language of choice> script to generate the literal ;)
Matthew Iselin
Uh, run the code, copy, and paste? Heh.
JoshJordan
I forgot that the program still actually works. Looks like I need more coffee :)
Matthew Iselin
Haha :) I could use some caffeine myself.
JoshJordan
Sounds like something I would do. Only the teacher would give me an F for no good reason.
Steve Wortham
A: 

Parallelize it.

Stefano Borini
If it were 20000000 concatenations, I'd say this was the way to go. As it is, I doubt the speed boost is enough to outweigh the overhead.
Michael Myers
There is no optimization performed if you parallelize it. The same speed problems would occur as input size swells.
San Jacinto
@unknown (google): Oh, okay. What were your numbers?
Michael Myers
You can rework it as independent concatenations of the b string. works for numbers, I agree that for strings it can be stupid though.
Stefano Borini
@mmyers my "numbers?" I can't tell if you're being sarcastic or not, so I'll offer a short defense of my comment, although I'm pretty sure you understand this already. Optimization implies changing things at an algorithmic level, doing less work or using less memory to achieve the result. "parallelize it" implies: "do the same thing, except split it up and run parts concurrently." If this is what you do, all that happens is as the input size grows is a repeat of the same problems. Additionally, you incurr a lot of provlems with memory consumption if you "split" the problem incorrectly.
San Jacinto
The question was how to make the code execute faster. If it's parallelized and there is more than one processor available, then it will execute faster. It won't change the algorithmic complexity, and it will definitely increase the implementation complexity, but it should be faster just the same. I'm just saying that if one thread does 10000000 concatenations and another thread does 10000000 concatenations at the same time (in StringBuilders of course), that will be faster than one thread doing 20000000 concatenations. Is that fair to say?
Michael Myers
Yes, absolutely (assuming you've found a good mechanism to ensure mutual exclusion). Similar to your "all i'm saying is.." all I'm saying is that you hit the same wall, just later on down the line. Additionally, something I failed to mention earlier, the "stride" of this algorithm is 1. I'm not sure how you'd parallize this, as each iteration of the loop requires the answer from the prior iteration... unless you "seed" some of the threads with a starting values, but then how are these values constructed? The only optimization we've seen so far is to allocate all the necessary memory at once.
San Jacinto
Nope. Each iteration loops needs the previous only if you build things as the OP specified. If you build the World concatanations independently and only at the end prepend it with Hello, then you don't incur in the problem.However, I really think you are taking my answer too seriously. The question as it was presented didn't leave much space to optimization, and it appeared as a homework. My answer was more like of a joke.
Stefano Borini
@stefano borini yeah, i went back and read the OP after your last comment and saw that i read it way too fast. you are correct that there is parallelization potential here, so the "stride" comment was wrong.
San Jacinto
i ran some numbers for fun. please see my answer.
San Jacinto
+32  A: 

From the StringBuilder documentation:

Performance Considerations

The Concat and AppendFormat methods both concatenate new data to an existing String or StringBuilder object. A String object concatenation operation always creates a new object from the existing string and the new data. A StringBuilder object maintains a buffer to accommodate the concatenation of new data. New data is appended to the end of the buffer if room is available; otherwise, a new, larger buffer is allocated, data from the original buffer is copied to the new buffer, then the new data is appended to the new buffer.

The performance of a concatenation operation for a String or StringBuilder object depends on how often a memory allocation occurs. A String concatenation operation always allocates memory, whereas a StringBuilder concatenation operation only allocates memory if the StringBuilder object buffer is too small to accommodate the new data. Consequently, the String class is preferable for a concatenation operation if a fixed number of String objects are concatenated. In that case, the individual concatenation operations might even be combined into a single operation by the compiler. A StringBuilder object is preferable for a concatenation operation if an arbitrary number of strings are concatenated; for example, if a loop concatenates a random number of strings of user input.

static void Main(string[] args) {
    String a = "Hello ";
    String b = " World! ";
    StringBuilder result = new StringBuilder(a.Length + b.Length * 20000);
    result.Append(a);
    for (int i=0; i<20000; i++) {
        result.Append(b);
    }
    Console.WriteLine(result.ToString());
}
280Z28
+1 for a correct implementation that actually produces the same output as the original question.
Grant Wagner
I thought this was C++ for some reason when I first read it. +1
Ed Swangren
This is homework, but the key issue here is string immutability. You cannot change strings, and appending 20000 will have horrendous performance. So the answer the teacher is obviously after is to use the string builder to do the concatenation.
Spence
Someone voted me down without an explanation. :fail:
280Z28
Wouldn't it be awesome if I could BSD-license my answers to HW problems lol
280Z28
@Spence - you give the teacher *way* too much credit...
Marc Gravell
+2 for passing the estimated capacity in the constructor.
David Grant
+1 because this is probably the answer the teacher is looking for. It's still a stupid question for the teacher to be asking.
Imagist
+1 for a real answer, and for teaching me something. Until now, I was always curious as to why people used string builder over concating strings.
Matt Grande
+1  A: 

It's likely to be IO dominated ( writing the output to the console or a file will be the slowest part ), so probably won't benefit from a high degree of optimisation. Simply removing obvious pessimisations should suffice.

As a general rule, don't create temporary objects. Each iteration of your loop creates a temporary string, coping the entire previous string in a and the value of the string in b, so has to do up to 20000 times the length of b operations each time through the loop. Even so, that's only 3 billion bytes to copy, and so should complete in less than a second on a modern machine ( assuming the runtime uses the right operations for the target hardware ). Dumping 160,008 characters to the console may well take longer.

One technique is to use a builder or buffer to create fewer temporary objects, instead creating a long string in memory using a StringBuilder then copying that to a string, then outputting that string.

However, you can go one stage further and achieve the same functionality by writing the output directly, rather than creating any temporary strings or buffers, using Console.Write in the loop instead. That will remove two of copying operations ( the string b is copied to the buffer then the buffer is copied to a string object then the string's data to the output buffer; the final copy operation is the one internal to Console.Write so is not avoidable in C# ), but require more operating system calls, so may or may not be faster.

Another common optimisation is to unroll the loop. So instead of having a loop which has one line which writes one " World! " and is looped 20,000 times, you can have (say) five lines which write one " World! " each and loop them 4,000 times. That's normally only worth doing in itself if the cost of incrementing and testing the loop variable is high compared to what you're doing in the loop, but it can lead to other optimisations.

Having partially unrolled the loop, you can combine the code in the loop and write five or ten " World! "s with one call to Console.Write, which should save some time in that you're only making one fifth the number of system calls.


Writing to console, in cmd window, it appears limited by speed of console window:

( times in seconds for 100 runs )

     724.5312500 - concat
      53.2187500 - direct
      30.3906250 - direct writing b x10
      30.3750000 - direct writing b x100
      30.3750000 - builder
      30.3750000 - builder writing b x100

writing to file, the times for the different techniques differ:

     205.0000000 - concat
       9.7031250 - direct
       1.0781250 - direct writing b x10
       0.5000000 - builder
       0.4843750 - direct writing b x100
       0.4531250 - builder writing b x100

From this it's possible to draw two conclusions:

Most of the improvements don't matter if you're writing to the console in a cmd.exe window. You do have to profile the system as a whole, and (unless you're trying to reduce the energy use of the CPU ) there's no point optimising one component beyond the capbilities of the rest of the system.

Although apparently doing more work - copying the data more and calling the same number of functions, the StringBuilder approach is faster. This implies that there's quite a high overhead in each call to Console.Write, compared with the equivalent in non-managed languages.

writing to file, using gcc C99 on Win XP:

    0.375 - direct ( fputs ( b, stdout ) 20000 times )
    0.171 - direct unrolled ( fputs ( b x 100, stdout ) 200 times )
    0.171 - copy to b to a buffer 20000 times then puts once

The lower cost of the system call in C allows it to get towards being IO bound, rather than limited by the .net runtime boundaries. So when optimizing .net, managed/unmanaged boundaries become important.

Pete Kirkham
+2  A: 
static void Main(string[] args) 
{
    const String a = "Hello " +
        /* insert string literal here that contains " World! " 20000 times. */ ;
    Console.WriteLine(a);
}

I can't believe that they teach nonsense like this in schools. There isn't a real-world example of why you would ever do this, let alone optimize it. All this teaches is how to micro-optimize a program that does nothing useful, and that's counterproductive to the student's health as a programmer/developer.

Imagist
Agreed. Any example that is not a real world example (outside of basic explanations of what things do) is pretty much useless. Math is also notorious for this. I think textbook writers do it because they get bored.
Sneakyness
Actually, learning that strings are immutable and that you shouldn't concatenate when you have a large number of iterations is useful and relevant to real-world C# programming.
Dan Diplo
@Dan-Diplo I agree, but this situation is entirely artificial and it is doubtful that the students would make any connection between this situation and a real-world situation where that information would be important. For a similar situation, have them optimize a Markov Text Generator.
Imagist
A: 

I wonder, would this be any faster?

static void Main(string[] args) {
    String a = "Hello ";
    String b = " World! ";
    int worldCount = 20000;
    StringBuilder worldList = new StringBuilder(b.Length * worldCount);
    worldList.append(b);
    StringBuilder result = new StringBuilder(a.Length + b.Length * worldCount);
    result.Append(a);

    while (worldCount > 0) {

       if ((worldCount & 0x1) > 0) {  // Fewer appends, more ToStrings.
          result.Append(worldList);   // would the ToString here kill performance?
       }
       worldCount >>= 1;
       if (worldCount > 0) {
          worldList.Append(worldList);
       }
    }

    Console.WriteLine(result.ToString());
}
GenericKen
Quit wondering and wrap the code with calls to System.Diagnostics.StopWatch.
ebpower
A: 

Depends on what's going in the String object I guess. If internally all they have is a null-terminated string, then you could optimize by storing the length of the string somewhere. Also, if you're just outputting to stdout, it would make more sense to move your output call inside the loop (less memory overhead), and it should also be faster.

+1  A: 

MemoryStream is slightly faster than using StringBuilder:

    static void Main(string[] args)
    {
        String a = "Hello ";
        String b = " World! ";

        System.IO.MemoryStream ms = new System.IO.MemoryStream(20000 * b.Length + a.Length);
        System.IO.StreamWriter sw = new System.IO.StreamWriter(ms);

        sw.Write(a);
        for (int i = 0; i < 20000; i++)
        {
            sw.Write(b);
        }

        ms.Seek(0,System.IO.SeekOrigin.Begin);
        System.IO.StreamReader sr = new System.IO.StreamReader(ms);
        Console.WriteLine(sr.ReadToEnd());
    }
ebpower
+3  A: 

Perform output in the loop (5x faster, same result):

static void Main(string[] args) 
{
    Console.Write("Hello ");
    for (int i=0; i<20000; i++)
       Console.Write(" World! ");
    Console.Write(Environment.NewLine);
}

Or allocate the memory on forehand and fill it up (4x faster, same result):

static void Main(string[] args) 
{
   String a = "Hello "; 
   String b = " World! ";

   int it = 20000;
   char[] result = new char[a.Length + it*b.Length];

   a.ToCharArray().CopyTo(result, 0);

   for (int i = 0; i < it; i++) 
      b.ToCharArray().CopyTo(result, a.Length + i * b.Length);

   Console.WriteLine(result);    
}
Aert
How is I/O performed in .NET? Depending on that answer, your first solution may actually be much slower because of your I/O.
San Jacinto
Interesting point; I timed the Write / WriteLine parts of both method, giving 860ms for the writes in the for loop and 970ms for the single writeline. So the seperate writing is actually alot faster and elimates the time-consuming creation of the large string. Perhaps the small writes dump a whole line in one go, whereas the large writeline has to operate sequentially per character?
Aert
I am curious what happens if you swell the number of iterations to 10 times the original number, do another timing, and then time it again at 100 times the size of the original input. If I get the time throughout the day, I'll post my numbers.
San Jacinto
numbers posted. please see my answer.
San Jacinto
A: 

Here are some timing results. Each test was conducted starting with 20000 iterations. Every test includes output in the timings, unless stated otherwise. Each number for a group means number of iterations was 10 times greater than the previous. If there are less than 4 numbers, the test was taking too long so i killed it. "Parallize it" means i split the number of concatenations evenly over 4 threads and appended the results when all had finished (could probable have saved a little time here and put them into a queue and appended them as they finished, but didn't think of that until now). All times are in milliseconds.

656 6658 66999 370717 output hello loop output world. no concatenation.

658 6641 65807 554546 build with stringbuilder then output

664 6571 65676 314140 build with stringbuilder with large initial size no output

2761 367042 OP, strings only (killed test while concatenating; nothing printed to screen)

167 43227 parallize it OP no output

27 40 323 1758 parallize it stringbuilder no output

San Jacinto