views:

303

answers:

3

I have a question about using streams in .NET to load files from disk. I am trying to pinpoint a performance problem and want to be sure it's where I think it is.

Dim provider1 As New MD5CryptoServiceProvider Dim stream1 As FileStream

stream1 = New FileStream(FileName, FileMode.Open, FileAccess.Read, FileShare.Read) provider1.ComputeHash(stream1)

Q: Are the bytes read from disk when I create the FileStream object, or when the object consuming the stream, in this case an MD5 Hash algorithm, actually reads it?

I see significant performance problems on my web host when using the ComputeHash method, compared to my local test environment. I'm just trying to make sure that the performance problem is in the hashing and not in the disk access.

A: 

Yes content of the file will be read then you run ComputeHash method and not when you just open a FileStream.

The best way to test where the performance problem is , it is to read data from file to memory stream hash it and measure performance of each of this steps. You can use System.Diagnostics.Stopwatch class for this.

MichaelT
A: 

Bytes from disk should be read when the caller requests them by invoking Read or similar methods. At any rate, both the hard disk and the operating system perform some read-ahead to improve sequential read operations, but this is surely hard to predict.

You could also try to play with the buffer size parameter that some constructor overloads provide for FileStream.

Dario Solera
+1  A: 

FileStream simply exposes an IO.Stream around a file object, and uses buffers. It doesn't read the entire file in the constructor (the file could be larger than RAM).

The performance issue is most likely in the hashing, and you can perform some simple benchmarks to prove whether it's because of file IO or the algorithm itself.

But one of the first things you might try is:

provider1.ComputeHash(stream1.ToArray());

This should make the FileStream read the entire file and return an array of bytes. .ToArray() may invoke a faster method than the .Read() method that ComputeHash will call.

C. Lawrence Wenham