tags:

views:

67

answers:

3

Say I have the following LINQ queries:

var source = from workflow in sourceWorkflowList
             select new { SubID = workflow.SubID,
                          ReadTime = workflow.ReadTime,
                          ProcessID = workflow.ProcessID,
                          LineID = workflow.LineID };

var target = from workflow in targetWorkflowList
             select new { SubID = workflow.SubID,
                          ReadTime = workflow.ReadTime,
                          ProcessID = workflow.ProcessID,
                          LineID = workflow.LineID };

var difference = source.Except(target);

sourceWorkflowList and targetWorkflowList have the exact same column definitions. But they both contain more columns of data than what is shown in the queries above. Those are just the columns needed for this particular issue.

difference contains all rows in sourceWorkflowList that are not contained in targetWorkflowList

Now what I would like to do is to remove all rows from sourceWorkflowList that do not exist in difference. Could someone show me a query that would do this?

Thanks very much - Randy

+2  A: 

What you actually want is what's in the source and not in (what's in the source and not in target): S(S\T) = S CUT T

var result = from sourceWorkflow in sourceWorkflowList
             join targetWorflow in targetWorkflowList on
                 new {sourceWorkflow.SubID, sourceWorkflow.ReadTime, sourceWorkflow.ProcessID, sourceWorkflow.LineID}
                 equals
                 new {targetWorflow.SubID, targetWorflow.ReadTime, targetWorflow.ProcessID, targetWorflow.LineID}
             select sourceWorkflow;

And in a different form (but this will only give you the 4 columns):

var result = sourceWorkflowList.Select(workflow => new {workflow.SubID, workflow.ReadTime, workflow.ProcessID, workflow.LineID})
    .Intersect(sourceWorkflowList.Select(workflow => new {workflow.SubID, workflow.ReadTime, workflow.ProcessID, workflow.LineID}));
brickner
Wouldn't it be easier to just use Intersect? http://msdn.microsoft.com/en-us/library/system.linq.enumerable.intersect.aspx
Rob Fonseca-Ensor
@Rob Fonseca-Ensor: Correct. I've added this to the answer.
brickner
@Brickner - This is close. Yes, I want what's in source that's not in the target, but I only want the comparison to be done on the four columns shown in my original post. There are three additional columns in the table that I don't want used for the comparison.
Randy Minder
@Randy Minder, I've updated my answer.
brickner
@Brickner - Your first query is what I'm looking for. But, remember, I want all rows in source that are not in target. The query you provided joins source to target so I'll only get rows that are in both.
Randy Minder
@Randy Minder, you've asked in the question for all the rows in the source that are not in **difference**. That's intersection.
brickner
+1  A: 

Assuming you're using a List<T>:

sourceWorkflowList.RemoveAll(
    workflow => difference.Contains(
                    new { 
                             SubID = workflow.SubID,
                             ReadTime = workflow.ReadTime,
                             ProcessID = workflow.ProcessID,
                             LineID = workflow.LineID 
                         }));

Apologies for formatting...

Rob Fonseca-Ensor
+1  A: 

If you have a constraint that requires you to only make the change in the original container do a Remove as suggested by @rob-fonseca-ensor,

If the difference list is large consider converting it to a HashSet() to get fast lookups first.

Otherwise...

If you can change the way you are getting difference use the join/intersect option suggested by @brickner as this prevents multiple iterations of the list.

If a new collection is acceptable but you already have difference (cannot replace the code that generates it):

var changedSource = source.Except(difference);
Handcraftsman