Is there any in-built class/method for comparing content of two audio/ video files? Or is there any in-built class/method for converting a audio/video file to bit stream?
You could do a byte-wise comparison of the two files. System.IO.File.ReadAllBytes(...) would be useful for that.
You could use the hash functions in System.Security.Cryptography on two file streams and compare them. This is easy to do and works well for small files. If your files are big, which they probably are if you're dealing with audio/video, then reading in the file and generating the hash can take a bit of time.
The other answers are good - either hashing (if you are comparing the file to multiple candidates) or a byte-wise comparison (if comparing two single files).
Here are a couple of additional thoughts:
First, check the file sizes - if they are different, then don't waste time comparing bytes. These are quick to check.
Second, try searching from the end or the middle of the file using a binary chop approach.
E.g., suppose you have a file like this:
ABCDEFGHIJKLMNOP
Then it is modified to this:
ABCDEF11GHIJKLMN
For the file size to remain the same, and content to have been inserted, the other bytes will be "knocked out". So a binary chop approach might pick this up with less reads (e.g., in seek to and read bytes SIZE/2-10 to SIZE/2+10 from both files, and compare).
You could try to combine the techniques. If you do it over a good enough sample of the data you deal with, you might find that of all the different files you compare (example):
- 80% were found because the file size was different (10ms per file)
- 10% were found due to binary chop (50ms per file)
- 10% were found due to linear byte comparisons (2000ms per file)
Doing a binary chop over the whole file wouldn't be so smart, since I expect the hard disk will be faster if reading linearly rather than seeking to random spots. But if you check SIZE/2, then SIZE/4+SIZE/4x3, then SIZE/8, for say 5 iterations, you might find most of the differences without having to do a bytewise comparrison. Just some ideas.
Also, instead of reading from the front of the file, perhaps try reading from the end of the file backwards. Again you might be trading off seek time for probability, but in the "insert" scenario, assuming a change is made halfway into the file, you'll probably find this faster by starting from the end than from the start.
There is no direct way to compare files. And you have to deal with Audio / Video files, which will be relatively big, I don't know Bitwise comparison will work or not.
Example: Generating SHA1 and MD5 hashes in .NET (C#)
public static string GenerateHash(string filePathAndName)
{
string hashText = "";
string hexValue = "";
byte[] fileData = File.ReadAllBytes(filePathAndName);
byte[] hashData = SHA1.Create().ComputeHash(fileData); // SHA1 or MD5
foreach (byte b in hashData)
{
hexValue = b.ToString("X").ToLower(); // Lowercase for compatibility on case-sensitive systems
hashText += (hexValue.Length == 1 ? "0" : "") + hexValue;
}
return hashText;
}
Example: Binary Comparison of 2 Files
/// <summary>
/// Methode, die einen Binärvergleich von 2 Dateien macht und
/// das Vergleichsergebnis zurückliefert.
/// </summary>
/// <param name="p_FileA">Voll qualifizierte Pfadangabe zur ersten Datei.</param>
/// <param name="p_FileB">Voll qualifizierte Pfadangabe zur zweiten Datei.</param>
/// <returns>True, wenn die Dateien binär gleich sind, andernfalls False.</returns>
private static bool FileDiffer(string p_FileA, string p_FileB)
{
bool retVal = true;
FileInfo infoA = null;
FileInfo infoB = null;
byte[] bufferA = new byte[128];
byte[] bufferB = new byte[128];
int bufferRead = 0;
// Die Dateien überprüfen
if (!File.Exists(p_FileA))
{
throw new ArgumentException(String.Format("Die Datei '{0}' konnte nicht gefunden werden", p_FileA), "p_FileA");
}
if (!File.Exists(p_FileB))
{
throw new ArgumentException(String.Format("Die Datei '{0}' konnte nicht gefunden werden", p_FileB), "p_FileB");
}
// Dateiinfo wegen der Dateigröße erzeugen
infoA = new FileInfo(p_FileA);
infoB = new FileInfo(p_FileB);
// Wenn die Dateigröße gleich ist, dann einen Vergleich anstossen
if (infoA.Length == infoB.Length)
{
// Binärvergleich
using (BinaryReader readerA = new BinaryReader(File.OpenRead(p_FileA)))
{
using (BinaryReader readerB = new BinaryReader(File.OpenRead(p_FileB)))
{
// Dateistream blockweise über Puffer einlesen
while ((bufferRead = readerA.Read(bufferA, 0, bufferA.Length)) > 0)
{
// Dateigrößen sind gleich, deshalb kann hier
// ungeprüft auch von der 2. Datei eingelesen werden
readerB.Read(bufferB, 0, bufferB.Length);
// Bytevergleich innerhalb des Puffers
for (int i = 0; i < Math.Min(bufferA.Length, bufferRead); i++)
{
if (bufferA[i] != bufferB[i])
{
retVal = false;
break;
}
}
// Wenn Vergleich bereits fehlgeschlagen, dann hier schon abbruch
if (!retVal)
{
break;
}
}
}
}
}
else
{
// Die Dateigröße ist schon unterschiedlich
retVal = false;
}
return retVal;
}