views:

1444

answers:

6

Is there any in-built class/method for comparing content of two audio/ video files? Or is there any in-built class/method for converting a audio/video file to bit stream?

+1  A: 

You could do a byte-wise comparison of the two files. System.IO.File.ReadAllBytes(...) would be useful for that.

Chris
Right, you could read all the bytes and just start comparing. Obviously, start with just the number of bytes, then compare each byte. This is one method that is simple to implement.
BobbyShaftoe
That would work, were you trying to identify how identicle files are. We have no idea, at this point, what this guy is trying to, however. Software is about solving problems. Real problems. You solve someone's problem, you get an intellectual hard-on, and hopefully a paycheck. Otherwise, too bad.
Chris
+2  A: 

You could use the hash functions in System.Security.Cryptography on two file streams and compare them. This is easy to do and works well for small files. If your files are big, which they probably are if you're dealing with audio/video, then reading in the file and generating the hash can take a bit of time.

sipwiz
Yeah, this is a good first though. Actually, you could use MD5 and get fairly decent performance for small files. It depends on the requirement. Its possible that two different files have the same hash. Just an aside.
BobbyShaftoe
It does, however, give you a quick gurantee of NO though. Probably you could take the first X number of bytes and hash that. Then go from there.
BobbyShaftoe
Given the number of operations in a hash, would just comparing the file bytes be faster?
Mitch Wheat
@Mitch Wheat, ya to an extent. However, in some cases you might keep more in memory then you need. If you have a source and you want to compare 10 files. You could just compute the hash of one file then compute hashes for the other files without storing both the source and the second file in memory.
BobbyShaftoe
Okay, given two numbers that are hashes, and are different, what the hell does one hash integer have to do with another? Relatively, what can you do with two hashes to compare them? If they're actual hashes, nothing. No? :)
Chris
A: 

The other answers are good - either hashing (if you are comparing the file to multiple candidates) or a byte-wise comparison (if comparing two single files).

Here are a couple of additional thoughts:

First, check the file sizes - if they are different, then don't waste time comparing bytes. These are quick to check.

Second, try searching from the end or the middle of the file using a binary chop approach.

E.g., suppose you have a file like this:

ABCDEFGHIJKLMNOP

Then it is modified to this:

ABCDEF11GHIJKLMN

For the file size to remain the same, and content to have been inserted, the other bytes will be "knocked out". So a binary chop approach might pick this up with less reads (e.g., in seek to and read bytes SIZE/2-10 to SIZE/2+10 from both files, and compare).

You could try to combine the techniques. If you do it over a good enough sample of the data you deal with, you might find that of all the different files you compare (example):

  • 80% were found because the file size was different (10ms per file)
  • 10% were found due to binary chop (50ms per file)
  • 10% were found due to linear byte comparisons (2000ms per file)

Doing a binary chop over the whole file wouldn't be so smart, since I expect the hard disk will be faster if reading linearly rather than seeking to random spots. But if you check SIZE/2, then SIZE/4+SIZE/4x3, then SIZE/8, for say 5 iterations, you might find most of the differences without having to do a bytewise comparrison. Just some ideas.

Also, instead of reading from the front of the file, perhaps try reading from the end of the file backwards. Again you might be trading off seek time for probability, but in the "insert" scenario, assuming a change is made halfway into the file, you'll probably find this faster by starting from the end than from the start.

Paul Stovell
A: 

There is no direct way to compare files. And you have to deal with Audio / Video files, which will be relatively big, I don't know Bitwise comparison will work or not.

Anuraj
A: 

Example: Generating SHA1 and MD5 hashes in .NET (C#)

public static string GenerateHash(string filePathAndName)
{
  string hashText = "";
  string hexValue = "";

  byte[] fileData = File.ReadAllBytes(filePathAndName);
  byte[] hashData = SHA1.Create().ComputeHash(fileData); // SHA1 or MD5

  foreach (byte b in hashData)
  {
    hexValue = b.ToString("X").ToLower(); // Lowercase for compatibility on case-sensitive systems
    hashText += (hexValue.Length == 1 ? "0" : "") + hexValue;
  }

  return hashText;
}
Free User
A: 

Example: Binary Comparison of 2 Files

/// <summary>
/// Methode, die einen Binärvergleich von 2 Dateien macht und
/// das Vergleichsergebnis zurückliefert.
/// </summary>
/// <param name="p_FileA">Voll qualifizierte Pfadangabe zur ersten Datei.</param>
/// <param name="p_FileB">Voll qualifizierte Pfadangabe zur zweiten Datei.</param>
/// <returns>True, wenn die Dateien binär gleich sind, andernfalls False.</returns>
private static bool FileDiffer(string p_FileA, string p_FileB)
{
    bool retVal = true;
    FileInfo infoA = null;
    FileInfo infoB = null;
    byte[] bufferA = new byte[128];
    byte[] bufferB = new byte[128];
    int bufferRead = 0;

    // Die Dateien überprüfen
    if (!File.Exists(p_FileA))
    {
        throw new ArgumentException(String.Format("Die Datei '{0}' konnte nicht gefunden werden", p_FileA), "p_FileA");
    }
    if (!File.Exists(p_FileB))
    {
        throw new ArgumentException(String.Format("Die Datei '{0}' konnte nicht gefunden werden", p_FileB), "p_FileB");
    }

    // Dateiinfo wegen der Dateigröße erzeugen
    infoA = new FileInfo(p_FileA);
    infoB = new FileInfo(p_FileB);

    // Wenn die Dateigröße gleich ist, dann einen Vergleich anstossen
    if (infoA.Length == infoB.Length)
    {
        // Binärvergleich
        using (BinaryReader readerA = new BinaryReader(File.OpenRead(p_FileA)))
        {
            using (BinaryReader readerB = new BinaryReader(File.OpenRead(p_FileB)))
            {
                // Dateistream blockweise über Puffer einlesen
                while ((bufferRead = readerA.Read(bufferA, 0, bufferA.Length)) > 0)
                {
                    // Dateigrößen sind gleich, deshalb kann hier
                    // ungeprüft auch von der 2. Datei eingelesen werden
                    readerB.Read(bufferB, 0, bufferB.Length);

                    // Bytevergleich innerhalb des Puffers
                    for (int i = 0; i < Math.Min(bufferA.Length, bufferRead); i++)
                    {
                        if (bufferA[i] != bufferB[i])
                        {
                            retVal = false;
                            break;
                        }
                    }

                    // Wenn Vergleich bereits fehlgeschlagen, dann hier schon abbruch
                    if (!retVal)
                    {
                        break;
                    }
                }
            }
        }
    }
    else
    {
        // Die Dateigröße ist schon unterschiedlich
        retVal = false;
    }

    return retVal;
}
Free User