tags:

views:

230

answers:

5

Suppose I have two lists that holds the list of source file names and destination file names respectively.

The Sourcefilenamelist has files as 1.txt, 2.txt,3.txt, 4.txt

while the Destinaitonlist has 1.txt,2.txt.

I ned to write a linq query to find out which files are in SourceList that are absent in DestinationFile list.

e.g. here the out put will be 3.txt and 4.txt. I have done this by a foreach statement. but now I want to do the same by using LINQ(C#).

Edit:

My Code is

List<FileList> sourceFileNames = new List<FileList>();

sourceFileNames.Add(new FileList { FileNames = "1.txt" });
sourceFileNames.Add(new FileList { FileNames = "2.txt" });
sourceFileNames.Add(new FileList { FileNames = "3.txt" });
sourceFileNames.Add(new FileList { FileNames = "4.txt" });

List<FileList> destinationFileNames = new List<FileList>();
destinationFileNames.Add(new FileList { FileNames = "1.txt" });
destinationFileNames.Add(new FileList { FileNames = "2.txt" });

IEnumerable<FileList> except =  sourceFileNames.Except(destinationFileNames);

And Filelist is a simple class with only one property fileNames of type string.

class FileList
    {
       public string FileNames { get; set; }
    }
+8  A: 
Sourcefilenamelist.Except(Destinaitonlist)
Marcelo Cantos
But I am getting the output as 4 filenames specified in SourceList instead of 2. Why so? I have added my code in the original question
Thinking
Are the matching filenames _exactly_ the same in both lists? No differences in capitalisation or whitespace?
Marcelo Cantos
Yes it is dito same
Thinking
+13  A: 

That's what Except is for:

var files = sourceFilenameList.Except(destinationList);

Note that this is a set operation, so if the source list has duplicate entries you'll only see unique results: new[] {a, a, b, b, c}.Except(new[] {b, c}) is just {a}, not {a, a}.

Like many LINQ operators, this returns an IEnumerable<T> - if you want it back as a List just call ToList:

var files = sourceFilenameList.Except(destinationList).ToList();

EDIT: Okay, now you've shown what FileList is, the problem is simply that you haven't implemented equality comparisons. You can do this either by overriding Equals and GetHashCode (and possibly IEquatable<FileList>) or by implementing IEqualityComparer<T>. However, you've still got a problem: FileNames is a mutable type, and those don't typically work well in terms of hashing and equality. Two instances may be equal initially, and then one of them could change. I'd recommend reimplementing this as an immutable type. Something like this:

public sealed class FileList : IEquatable<FileList>
{
    private readonly string fileNames;
    public string FileNames { get { return fileNames; } }

    public FileList(string fileNames)
    {
        // If you want to allow a null FileNames, you'll need to change
        // the code in a few places
        if (fileNames == null)
        {
            throw new ArgumentNullException("fileNames");
        }
        this.fileNames = fileNames;
    }

    public override int GetHashCode()
    {
        return fileNames.GetHashCode();
    }

    public override bool Equals(object other)
    {
        return Equals(other as FileList);
    }

    public bool Equals(FileList other)
    {
        return other != null && other.FileNames == FileNames;
    }
}

Your sample code could then become:

List<FileList> sourceFileNames = new List<FileList>
{
    new FileList("1.txt"),
    new FileList("2.txt"),
    new FileList("3.txt"),
    new FileList("4.txt")
};
List<FileList> destinationFileNames = new List<FileList>
{
    new FileList("1.txt"),
    new FileList("2.txt")
};

IEnumerable<FileList> except =  sourceFileNames.Except(destinationFileNames);
Jon Skeet
But I am getting the output as 4 filenames specified in SourceList instead of 2. Why so? I have added my code in the original question
Thinking
@Thinking: Without knowing what FileList is or what the input is, it's impossible to know what's going on. My guess is that FileList doesn't override Equals/GetHashCode properly, but we can't really tell. Note that you don't need to use "ref" in your CompareMissingFiles method, and it would be cleaner for GetFiles to return a `List<FileList>` instead of using `ref` there.
Jon Skeet
Even for List<FileList> sourceFileNames = new List<FileList>(); sourceFileNames.Add(new FileList { fileNames = "1.txt" }); sourceFileNames.Add(new FileList { fileNames = "2.txt" }); List<FileList> destinationFileNames = new List<FileList>(); destinationFileNames.Add(new FileList { fileNames = "1.txt" }); IEnumerable<FileList> except = sourceFileNames.Except(destinationFileNames); I am getting the wrong output
Thinking
@Thinking: As I said before, we don't know what your FileList class does.
Jon Skeet
It is also based on how the IEqualityComparer is implemented to determine if a FileList == FileList otherwise it goes by memory address
Mike
@Thinking: Which variable are you watching after the `Except`? If you are thinking that `Except` *modifies* the sourceList, you are assuming it wrong. Look at the contents of `except` variable in Jon's code, after the call to `Except` is made.
shahkalpesh
+5  A: 

Not too difficult to do. In your FileList class create a child class that inherits from IEqualityComparer<>

public class FileListComparer : IEqualityComparer<FileList>
{
    public bool Equals(FileList x, FileList y)
    {
        if (x == null || y == null)
        {
            return false;
        }

        return x.FileNames.Equals(y.FileNames, StringComparison.OrdinalIgnoreCase);
    }

    public int GetHashCode(FileList obj) { return base.GetHashCode(); }
}

And then when you call Except, use the Comparer

IEnumerable<FileList> except = sourceFileNames.Except(destinationFileNames, new FileList.FileListComparer() );
masenkablast
Rather than implement a separate IEqualityComparer, it's simpler to make FileList implement `IEquatable<FileList>`. (In fact, just overriding Equals/GetHashCode would be okay in this particular case.) Also note that your implementation of GetHashCode ignores the file list it's being asked to use - which means it will fail to match equal file lists.
Jon Skeet
I tried just overriding Equals and GetHashCode on the original class itself and ran into issues. I will try your suggestion.
masenkablast
+2  A: 

I upvoted masenkablast answer, i think the default equality comparer for class instances defaults to class instances' memory address comparison(not the value in class instance itself), so you have to provide your own value equality comparison.

But if you have simple data structure, try to use struct. I tried your code and changed class FileList to struct FileList, it works, it only displays 3 and 4

[EDIT] If you want to continue using class without implementing the IEqualityComparer, just implement IEquatable on your class, idea sourced from http://msdn.microsoft.com/en-us/library/bb300779.aspx

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace ExceptList
{
    class Program
    {
        static void Main(string[] args)
        {
            var sourceFileNames = new List<FileList>();

            sourceFileNames.Add(new FileList { FileNames = "1.txt" });
            sourceFileNames.Add(new FileList { FileNames = "2.txt" });
            sourceFileNames.Add(new FileList { FileNames = "3.txt" });
            sourceFileNames.Add(new FileList { FileNames = "4.txt" });

            List<FileList> destinationFileNames = new List<FileList>();
            destinationFileNames.Add(new FileList { FileNames = "1.txt" });
            destinationFileNames.Add(new FileList { FileNames = "2.txt" });

            var except = sourceFileNames.Except(destinationFileNames);


            // list only 3 and 4
            foreach (var f in except)
                Console.WriteLine(f.FileNames);

            Console.ReadLine();
        }
    }

    class FileList :  IEquatable<FileList>
    {
        public string FileNames { get; set; }


        #region IEquatable<FileList> Members

        public bool Equals(FileList other)
        {
            //Check whether the compared object is null.
            if (Object.ReferenceEquals(other, null)) return false;

            //Check whether the compared object references the same data.
            if (Object.ReferenceEquals(this, other)) return true;

            return FileNames.Equals(other.FileNames);

        }        

        #endregion

        public override int GetHashCode()
        {
            return FileNames.GetHashCode();
        }
    }

}
Michael Buen
That's a smart idea, I would have never thought of using struct.
masenkablast
I have reservations about using mutable variables in GetHashCode etc. See my revised answer for an alternative.
Jon Skeet
thanks Jon, learned the importance of immutable hashcode today(http://blogs.msdn.com/jaredpar/archive/2008/04/28/properly-implementing-equality-in-vb.aspx http://blogs.msdn.com/jaredpar/archive/2009/01/15/if-you-implement-iequatable-t-you-still-must-override-object-s-equals-and-gethashcode.aspx)
Michael Buen
A: 

I think Jon Skeet's answer is the best answer, but your other option is looking directly into the property you want to compare (FileNames)

var destNames = destinationFileNames.Select(destName => destName.FileNames);
IEnumerable<FileList> except =  sourceFileNames
    .Where(sourceName => !destNames.Contains(sourceName.FileNames));

or (same thing in one expression)

IEnumerable<FileList> except =  sourceFileNames
    .Where(sourceName => !destinationFileNames
        .Select(destNames => destNames.FileNames)
        .Contains(sourceName.FileNames));

edit: thanks for the downvote; I tested the code and found a bug. It works now!

Kirk Broadhurst
Is this solution too simple? Why the downvote?
Kirk Broadhurst