tags:

views:

131

answers:

5

I want to replace a binary file if the contents are different.

So I need to be able to compare the binary file (without having to deserialize it).

Is this possible?

I used binary formatter to save the file.

+5  A: 

Yes, you can generate the MD5 or SHA1 hash for each set of file data and then compare them.

Sample code (error checking removed for clarity):

public bool CompareFiles(string filePath1, string filePath2)
{

  FileInfo info1 = new FileInfo(filePath1);
  FileInfo info2 = new FileInfo(filePath2);


  byte[] data1 = new byte[info1.Length]
  byte[] data2 = new byte[info2.Length]; 

  FileStream fs1 = new FileStream(filePath1, FileMode.Open);
  FileStream fs2 = new FileStream(filePath2, FileMode.Open);

  fs1.Read(data1, 0, info1.Length);
  fs2.Read(data2, 0, info2.Length);

  fs1.Dispose();
  fs2.Dispose();

  SHA1 sha = new SHA1CryptoServiceProvider(); 

  byte[] hash1 = sha.ComputeHash(data1);
  byte[] hash2 = sha.ComputeHash(data2);

  // c# 2 or less: you need to compare the hash bytes yourself

  // c# 3.5/4
  bool result = hash1.SequenceEqual(hash2);

  return result;
}
mjmarsh
if its just two files, it would be better to just compare the bytes directly, that way you could exit early at the first difference, but if you compute hashes you have to read the entire file which might be very expensive.
luke
@mjmarsh: Don't both of them read the whole file? What is the point then? Of course calculating the hash will be useful if you compare one file multiple times, but that does not seem to be the case.
Moron
Calculating hashes is actually worse - when comparing byte-by-byte you may stop after first few bytes (if they differ).
Tomas Petricek
@luke, @Moron, @Thomas: No question byte-by-byte is better for random files. But if the fileset was under his control he could calculate the hash on serialization and then use the pre-calculated hash for future comparisons.
mjmarsh
@mjmarsh: Well you can do a random byte compare and gain reasonable confidence that they are same or different without reading whole file, but you are right. A true compare needs to check each and every byte (assuming length is same). What I meant was, if we just do it once, there is no point calculating hash first and then doing the compare. Just comparing the bytes would be much more efficient and error-free. Anyway, as I commented on driis' answer, I think OP wants something completely different from what was written in the question.
Moron
+11  A: 

Yes it is possible.

You need to read the file in order to compare them, if that is what you are asking.

The pseudo-code would be:

  • Open file1 and file2 as streams.
  • Start by comparing length; if the length is not equal, the files are not equal.
  • Read a chunk of each file into a buffer, and compare the buffers. Repeat until you encounter differences or reach the end of the file.

If you need to compare the same file to a bunch of other files, it can be useful to calculate the hash of the first file. Then just calculate the hash of each of the other files, and compare the hashes.

driis
I would modify that last step slightly: "repeat until they are different or end of file reached" You wouldn't want to keep going to the end if the first bytes are different.
Neil N
@driis: I suspect the OP wants to detect if a class has changed based on the serialized versions. I don't think comparing files will work as BinaryFormatter does not guarantee that the 'same' class will have the exact same bytes.
Moron
@Neil, yes of course; I thought that was obvious - corrected the answer now :-)
driis
@Moron If he has 2 serialized files that represents different objects, they will be different. If he has 2 serialized files, that represents 2 different versions of a class, it is likely that they will be different (but of course not if the number and type of fields hasn't changed). Anyways, the OP should rephrase the question if my answer is not the kind of comparison he needs.
driis
@driis: I agree that the question is badly phrased. Anyway, what I am saying is that if files are different, it does not imply the classes are! And if the classes have some custom equality checks, then we have to deserialize anyway.
Moron
A: 
byte[] myFile = File.ReadAllBytes(pathToFile);

Then loop through it. Might be slow if the file is large.

Perhaps you should look for a file MD5 hash algorithm

Neil N
A: 

You can read binary content of the file and compare the bytes you get. To read the file you can either use ReadAllBytes (if the file is reasonably sized and will fit to the memory comfortably) or you can use FileStream and read chunks of data from both files.

The structure of the approach using buffers might look like this:

byte[] buffer1 = new byte[1024], buffer2 = new byte[1024];
using(var fs1 = new FileStream(firstFile, FileMode.Open, FileAccess.Read)
using(var fs2 = new FileStream(secondFile, FileMode.Open, FileAccess.Read)
{
  // Use: fs.Read(buffer1, 0, 1024) to repeatedly read 1kb of data
  // from both fs1 and fs2 and compare the content in buffer1 and buffer2
}

Some people recommended using hashes, but that's not a good idea - if the files are the same, you'll need to read all data from the file, so calculating hashes isn't more efficient then simply reading and comparing all data. However, if the files differ in the first few bytes, you'll need to read only first few bytes (if comparing byte-by-byte)!

Hashes would be useful if you wanted to compare multiple files (e.g. each with each).

Tomas Petricek
A: 

Here is a function to do it. Unless somone else can provide a better way to compare byte arrays.

private static bool CompareFiles(string file1, string file2)
{
    var fsFile1 = new System.IO.FileStream(file1, System.IO.FileMode.Open, System.IO.FileAccess.Read);
    var fsFile2 = new System.IO.FileStream(file2, System.IO.FileMode.Open, System.IO.FileAccess.Read);
    var md5 = new System.Security.Cryptography.MD5Cng();
    var md5File1 = md5.ComputeHash(fsFile1);
    var md5File2 = md5.ComputeHash(fsFile2);
    for (int i = 0; i < md5File1.Length; ++i)
    {
        if (md5File1[i] != md5File2[i])
            return false;
    }
    return true;
}
Scott Chamberlain