views:

153

answers:

4

I have a List<T1> of items and a second List<T2> of items. Both lists are sorted alphabetically by the property A. I know that the list of items in List<T2> are a subset of List<T1> and no items in List<T2> exists that don't exist in List<T1>.

I need to iterate through List<T1> and change a variable every time it matches a variable in List<T2>. What is the fastest and best way to do this? I am assuming I need to iterate through both lists but I know doing a nested foreach wouldn't make sense.

+5  A: 

If the lists aren't too large, this simplest way to do this is to call Contains:

foreach(var item in list1) {
    if (list2.Contains(item) {
        //Do something
    }
}

You can make it faster by calling BinarySearch using a custom IComparer<T>, like this:

class MyComparer : IComparer<YourClass> {
    private MyComparer() { }
    public static readonly MyComparer Instance = new MyComparer();

    public int CompareTo(YourClass a, YourClass b) {
        //TODO: Handle nulls
        return a.SomeProperty.CompareTo(b.SomeProperty);
    }
}
foreach(var item in list1) {
    if (list2.BinarySearch(item, MyComparer.Instance) >= 0) {
        //Do something
    }
}

In .Net 3.5, you can make it faster by using a HashSet<T>:

var hashset = new HashSet<YourClass>(list2);
foreach(var item in list1) {
    if (hashset.Contains(item) {
        //Do something
    }
}

If your lists are very large, you should measure the performance of each option and choose accordingly.
Otherwise, go for one of the first option, which is simplest.

SLaks
+2  A: 

If they are both sorted on a unique property, you can use this during your iteration. The idea is to loop through the superset, and then advance the subset iterator based on that sorted unique property until it either matches or is bigger/smaller (depending on sort order) than the superset one.

For a property sorted in ascending order:

if (subsetList.Count > 0)
{
    using(IEnumerator<T2> subset = subsetList.GetEnumerator())
    {
        subset.MoveNext();
        T2 subitem = subsetList.Current;
        foreach(T1 item in supersetList)
        {
            while (item.A > subitem.A &&
                   subset.MoveNext())
            {
                subitem = subset.Current;
            }

            if (item.A == subitem.A)
            {
                // Modify item here.
            }
        }
    }
}

Note that this does not actually rely on supersetList being a superset of subsetList. EndangeredMassa's solution is cleaner with that assumption being true.

jdmichal
This is the same as my answer, except that you don't handle the case where there are multiple entries in the superset equal to a single entry in the subset.
JSBangs
That is handled. It will not iterate the subitem unless the superset has moved beyond that item. So multiple entries of the same value in the superset will not advance the subset iterator. I did screw up the comparison on the while loop though. Fixed.
jdmichal
+9  A: 

For this type of thing, I prefer a double walking for loop. See the example below.

var super = new List<Contact>();
super.Add(new Contact() {Name = "John"});
super.Add(new Contact() {Name = "Larry"});
super.Add(new Contact() {Name = "Smith"});
super.Add(new Contact() {Name = "Corey"});

var sub = new List<Contact>();
sub.Add(new Contact() {Name = "Larry"});
sub.Add(new Contact() {Name = "Smith"});

var subCount = 0;
for(int i=0; i<super.Count && subCount < sub.Count; i++)
{
    if (super[i].Name == sub[subCount].Name)
    {
        Act(super[i], sub[subCount]);
        subCount++;
    }
}

Where Act(...) performs whatever action you are looking to do.

The loop increments the super list each time, but only increments the sub list when you find a match.

Note that this only works because of your two assumptions. 1) The lists are both sorted and 2) The second list is a subset of the first.

EndangeredMassa
At first I thought this was wrong. But with the information that `sub` is a subset of `super`, this is a much cleaner solution that mine, which only assumes the sorting, and so must handle skipping over missed matches. Though this does not handle multiple entries having the same property value.
jdmichal
Right. Those assumptions are important for this method.
EndangeredMassa
That method will loop through each sublist item for every superlist item. That means it loops N*M times where N and M are the sizes of super and sub lists. It can work that way, but my method only loops N times where N is the length of the superlist.
EndangeredMassa
A: 

Your question implies that you want to avoid having to iterate over all of the items in the second list every time, which is what will happen in the worst-case naive solution using Contains(). Since both lists are sorted and list2 is a subset of list1, you know that no entry in list1 will have an index less than the corresponding entry in list2. With that in mind, you can make an efficient O(n) solution with two enumerators:

Debug.Assert(list1.Count > 0);
Debug.Assert(list1.Count >= list2.Count);

var enum1 = list1.GetEnumerator();
var enum2 = list2.GetEnumerator();

enum1.MoveNext();
while (enum2.MoveNext())
{
    // Skip elements from list1 that aren't equal to the current entry in list2
    while (!enum1.Current.Equals(enum2.Current))
        enum1.MoveNext();

    // Fire the OnEqual event for every entry in list1 that's equal to an entry
    // in list2
    do {
        OnEqual(enum1.Current, enum2.Current); 
    } while (enum1.MoveNext() && enum1.Current.Equals(enum2.Current));
}

enum1.Dispose();
enum2.Dispose();
JSBangs