ansaurus

Question

Answer 1

A:

If I understand what you are trying to do correctly:

Create a class containing the file path and md5hash, and make it implement the IComparable interface such that the CompareTo method works on the md5hash.

Iterate through each file creating a new object for each and throw them in an ArrayList. Then sort the ArrayList. Now all the files with the same md5hash'es will be located consecutively, so you can very easily see which files are duplicates.

Phil 2009-08-13 14:53:00

How do i create a Icomparable interface?

Crash893 2009-08-13 15:43:19

Answer 2

+2 A:

If I understand you correctly, you are using the hash to decide if two files are identical, and you are using the hash as the dictionary key. You can't have duplicate keys in a dictionary, so you'd want to have a Dictionary<Hash, IList<string>> and add any files to the list for each hash value.

Lee 2009-08-13 14:54:48

he's using the path as the key but you've hit on a better way of counting the duplicates here.

grenade 2009-08-13 15:10:18

If you use Lee's suggestion of hashes as keys and paths as values the counting will already be done for you when the dictionary is populated.

grenade 2009-08-13 15:14:18

That is a good idea

Crash893 2009-08-13 15:48:20

Answer 3

A:

It really depends on whether you want to keep the 'duplicate' data and just not print it out, or if you really truly do not want the data in the dictionary. Tahts a decision only you can make in relation to your program.

cyberconte 2009-08-13 14:59:32

Answer 4

A:

When you read the files and create their hashes you could simply employ a second list that you throw your hash values into. Befor inserting you would then check if the list already contains an item with the new value.

This approach has a little memory overhead but saves some loop iterations.

Frank Bollack 2009-08-13 15:18:19

Answer 5

A:

Assuming that dict is a Dictionary that contains the filename as the key and the MD5 hash as the value, you could use the following code to display duplicate files :

var groupedByHash = from kvp in dict
                    group kvp by kvp.Value into grp
                    let count = grp.Count()
                    where count > 1
                    select grp;

foreach (IGrouping<string,KeyValuePair<string,string>> grp in groupedByHash)
{
    Console.WriteLine("Hashcode : {0}", grp.Key);
    foreach(KeyValuePair<string,string> kvp in grp)
    {
        Console.WriteLine("\tFilename = {0}", kvp.Key);
    }
    Console.WriteLine();
}

Thomas Levesque 2009-08-13 15:42:22

ansaurus

tags:

views:

answers:

Find Duplicate Values in a dictonary

related questions