views:

975

answers:

4

I am facing a performance problem while using the FileStream.Write function.

I have a console application that i use to read a Base64 string from a file (~ size is 400 KB) using a StreamReader object. I convert this string to a byte array using Convert.FromBase64String. I then write this byte array to a file using the FileStream object. The byte array length obtained here was 334991.

I measured the time taken to write the byte array - and it comes out to be approximately 0.116 seconds.

Just for fun, i got the array of bytes from the same Base64 encoded string using the ASCIIEncoding.GetBytes function (even though i knew that that this would not give the correct DECODED output - i just wanted to try it out). I wrote this byte array to the file using a FileStream object. The byte array length obtained here was 458414.

I measured the time taken to write the byte array using this methodology - and it comes out to be approximately 0.008 seconds.

Here is the sample code:

class Program
{
    static void Main(string[] args)
    {
        Stopwatch stopWatch = new Stopwatch();
        TimeSpan executionTime;

        StreamReader sr = new StreamReader("foo.txt");
        string sampleString = sr.ReadToEnd();
        sr.Close();

        ////1. Convert to bytes using Base64 Decoder (The real output!)
        //byte[] binaryData = Convert.FromBase64String(sampleString);

        //2. Convert to bytes using AsciiEncoding (Just for Fun!)
        byte[] binaryData = new System.Text.ASCIIEncoding().GetBytes(sampleString);
        Console.WriteLine("Byte Length: " + binaryData.Length);

        stopWatch.Start();
        FileStream fs = new FileStream("bar.txt", FileMode.Create, FileAccess.Write);
        fs.Write(binaryData, 0, binaryData.Length);
        fs.Flush();
        fs.Close();
        stopWatch.Stop();

        executionTime = stopWatch.Elapsed;
        Console.WriteLine("FileStream Write - Total Execution Time: " + executionTime.TotalSeconds.ToString());
        Console.Read();
    }
}

I ran tests for approximately 5000 files containing Base64 encoded string and the difference between the time taken to write these two types of byte array is almost a factor of 10 (with the one with writing the byte array using the real decoding taking more time).

The length of the byte array obtained using Convert.FromBase64String is less than the one obtained using an ASCIIEncoding.GetBytes function.

I wonder that all that I am trying to do is write a bunch of bytes using a FileStream object. So why should there be such a drastic performance difference (w.r.t. time taken) while writing a byte array to the disk?

Or am i doing something terribly wrong? Please advise.

+1  A: 

I gave some simular advice to another question, check out these tools and references from MS Research.

They'll help you un-stick any potential I/O problems, or at least understand them.

Also, you should be on the lookout for issues around the CLR LARGE object heap. Perticularly when using array's (anything over ~80kb has sub-optimal managed heap interactions, incase you ran this 5000 times in the same process).

However, really, after looking again, I do not think these are that closely related to your lemma. I ran this code in a profiler, and it simply shows that Convert.Base64 is consuming all your cycles.

Some other things, in your test code, you should always run your test 2+ times in a row, the jitter will have a chance to effect the runtime load. This can cause such a variation in execution time it's amazing. Right now I think you should re-evaluate your test harness, trying to take into account jitter and possible large object heap effects. (put one of these routines in front of the other...).

RandomNickName42
A: 

You might want to take a look at a series of articles (and the accompanying source project) that Jon Skeet wrote recently on the issue

here and here

Specifically, he was comparing buffering vs. streaming but there was also interesting results with different file sizes and thread counts.

SnOrfus
+1  A: 

I think the main problem in your code is that you are trying to compare cabbages with carrots (French expression):

Convert.FromBase64String and ASCIIEncoding().GetBytes don't do the same thing at all.

Just try to use any text file as an input to your program and it will fail on FromBase64 while running fine with ASCIIEncoding.

Now for the explanation on the performance hit:

  • ASCIIEncoding().GetBytes is just taking one character from your file and converting it into a byte (which is rather straight forward: there is nothing to do). For instance, it will translate 'A' into 0x41 and 'Z' into 0x5A...

  • For Convert.FromBase64String, it is another story. It is really translating a "base64 encoded string" into an array of bytes. A base64 string is a - let's say - "printable representation of binary data. Better, it is a "text" representation of binary data that allows it to be, for instance, sent over an internet wire. Images in mails are base64 encoded because the mail protocols are text based. So the process of converting back/forth base64 to bytes is not straightforward at all; thus the performance hit.

Fyi, a base64 string should look something like that:

SABlAGwAbABvAHcAIABXAG8AcgBsAGQAIQA=

which translates to "Hello World!" not immediate, isn't it?

Here is some info about the base64 format: http://en.wikipedia.org/wiki/Base64

Hope this helps

odalet
hi odalet, if you notice above, the time measurement does not include the conversion of base64 string to byte array- so i am not sure if your explaination of the performance hit stands. and also i have mentioned that i am comparing cabbages and carrots, just to highlight a performance related issue.
amit-agrawal
My very mistake: forget about it, I indeed didn't read well enough your question...
odalet
A: 

I am having the same problem as this. Did you find any solution/reason for the poor write performance?

fr73