views:

1012

answers:

7

I have a web server which will read large binary files (several megs) into byte arrays. The server could be reading several files at the same time (different page requests) so I am looking for the most optimized way for doing this without taxing the CPU too much. Is the code below good enough?

 public byte[] FileToByteArray(string fileName)
        {
            byte[] buff = null;
            FileStream fs = new FileStream(fileName, 
                                           FileMode.Open, 
                                           FileAccess.Read);
            BinaryReader br = new BinaryReader(fs);
            long numBytes = new FileInfo(fileName).Length;
            buff = br.ReadBytes((int) numBytes);
            return buff;
        }
+18  A: 

Simply replace the whole thing with:

return File.ReadAllBytes(fileName);

However, if you are concerned about the memory consumption, you should not read the whole file into memory all at once at all. You should do that in chunks.

Mehrdad Afshari
+1, don't reinvent the wheel ;)
Thomas Levesque
Cool, nice and simple
Rihan Meij
+4  A: 

I would think this:

byte[] file = System.IO.File.ReadAllBytes(fileName);
R. Bemrose
A: 

Your code can be factored to this (in lieu of File.ReadAllBytes):

public byte[] ReadAllBytes(string fileName)
{
    byte[] buffer = null;
    using (FileStream fs = new FileStream(fileName, FileMode.Open, FileAccess.Read))
    {
        buffer = new byte[fs.Length];
        fs.Read(buffer, 0, (int)fs.Length);
    }
    return buffer;
} 

Note the Integer.MaxValue - file size limitation placed by the Read method. In other words you can only read a 2GB chunk at once.

Also note that the last argument to the FileStream is a buffer size.

I would also suggest reading about FileStream and BufferedStream.

As always a simple sample program to profile which is fastest will be most beneficial.

Also your underlying hardware will have a large effect on performance. Are you using server based hard disk drives with large caches and a RAID card with onboard memory cache? Or are you using a standard drive connected to the IDE port?

Why would type of hardware make a difference? So if it's IDE you use some .NET method and if it's RAID you use another?
Tony_Henrich
@Tony_Henrich - It has nothing to do with what calls you make from your programming language. There are different types of hard disk drives. For example, Seagate drives are classified as "AS" or "NS" with NS being the server based, large cache drive where-as the "AS" drive is the consumer - home computer based drive. Seek speeds and internal transfer rates also affect how fast you can read something from disk. RAID arrays can vastly improve read/write performance through caching. So you might be able to read the file all at once, but the underlying hardware is still the deciding factor.
+2  A: 

Use the BufferedStream class in C# to improve performance. A buffer is a block of bytes in memory used to cache data, thereby reducing the number of calls to the operating system. Buffers improve read and write performance.

See the following for a code example and additional explanation: http://msdn.microsoft.com/en-us/library/system.io.bufferedstream.aspx

Todd Moses
What's the point of using a `BufferedStream` when you're reading the whole thing at once?
Mehrdad Afshari
He asked for the best performance not to read the file at once.
Todd Moses
Performance is measurable in the context of an operation. Additional buffering for a stream that you're reading sequentially, all at once, to memory is not likely to benefit from an extra buffer.
Mehrdad Afshari
any sample code source in real application, not msdn reference ??mister Moses.
alhambraeidos
+3  A: 

I might argue that the answer here generally is "don't". Unless you absolutely need all the data at once, consider using a Stream-based API (or some variant of reader / iterator). That is especially important when you have multiple parallel operations (as suggested by the question) to minimise system load and maximise throughput.

For example, if you are streaming data to a caller:

Stream dest = ...
using(Stream source = File.OpenRead(path)) {
    byte[] buffer = new byte[2048];
    int bytesRead;
    while((bytesRead = source.Read(buffer, 0, buffer.Length)) > 0) {
        dest.Write(buffer, 0, bytesRead);
    }
}
Marc Gravell
To add to your statement, I even suggest considering async ASP.NET handlers if you have an I/O bound operation like streaming a file to the client. However, if you *have to* read the whole file to a `byte[]` for some reason, I suggest avoid using streams or anything else and just use the system provided API.
Mehrdad Afshari
@Mehrdad - agreed; but the full context isn't clear. Likewise MVC has action-results for this.
Marc Gravell
Yes I need all the data at once. It's going to a third party webservice.
Tony_Henrich
What is the system provided API?
Tony_Henrich
@Tony: I stated in my answer: `File.ReadAllBytes`.
Mehrdad Afshari
A: 

Depending on the frequency of operations, the size of the files, and the number of files you're looking at, there are other performance issues to take into consideration. One thing to remember, is that each of your byte arrays will be released at the mercy of the garbage collector. If you're not caching any of that data, you could end up creating a lot of garbage and be losing most of your performance to % Time in GC. If the chunks are larger than 85K, you'll be allocating to the Large Object Heap(LOH) which will require a collection of all generations to free up (this is very expensive, and on a server will stop all execution while it's going on). Additionally, if you have a ton of objects on the LOH, you can end up with LOH fragmentation (the LOH is never compacted) which leads to poor performance and out of memory exceptions. You can recycle the process once you hit a certain point, but I don't know if that's a best practice.

The point is, you should consider the full life cycle of your app before necessarily just reading all the bytes into memory the fastest way possible or you might be trading short term performance for overall performance.

Joel
A: 

I would recommend trying the Response.TransferFile() method then a Response.Flush() and Response.End() for serving your large files.

Dave