views:

100

answers:

2

Hi everyone,

I'm looking to try to make an efficient delete method to work on a list. This situation is as follows:

Say for e.g. I have a (potentially) massive list of names:

Alex Smith
Anna Hobb
Bertie Blackman
Bill Clinton
David Smith
David Warner
George Jung
George Washington
William Wobbits

Lets say that this is a List<Person> with Person having properties FirstName and LastName. As in the example, there is the possibility for two people sharing same FirstName. What I need to do is go through the list and delete all Davids, for example.

I was looping through finding all davids, adding to a list DeletePerson, then looping through again for each DeletePerson and removing. I'm sure there would be a more efficient method? Efficiency isn't critical in this app but it just seems a long winded way to do it, also I guess as after the letter D in the first name we know we won't be adding any more to the DeletePerson list (assuming the list is sorted alphabetically)

Thanks!

+6  A: 

Updated answer:

If we aren't allowed to use RemoveAll or LINQ functions that do all the work for us, this is a way to do it more "manually":

    List<Person> newPersons = new List<Person>();
    foreach (Person person in persons)
    {
        if (person.FirstName != "David")
            newPersons.Add(person);
    }
    persons = newPersons();

The construction of the new list is very fast in .NET. Removing items one by one is slow because first each element to remove must be found in the list, and then when it is removed the remainder of the list must be moved up to fill the gap. This is much slower than the method I gave above.

To make the remove operation even faster the list could kept in sorted order or grouped by first name. But then the initial creation of the list would be slower.


Old answer:

It seems you are interested in a concise solution rather than one with optimal performance. If you are on .NET 3.5 you can use this:

persons = persons.Where(person => person.FirstName != "David").ToList();

Oh, and List has a RemoveAll method:

names.RemoveAll(person => person.FirstName == "David");

(Note Jimmy posted the RemoveAll idea before I did.)

Mark Byers
+1. Was typing out the identical answer almost but you beat me to it before I pushed post.
Kelsey
I would have agree, but I thought the question is tagged as algorithm. I am keen to know myself the optimal algorithm to perform the above without using linq
Fadrian Sudaman
@Fadrian: Fair enough. I've supplied an update to the answer that hopefully address this point.
Mark Byers
Thanks. I have voted for it now
Fadrian Sudaman
+3  A: 

For a concise approach, use RemoveAll.

list.RemoveAll(person => person.FirstName == "David")

for an (possibly? I haven't run any benchmarks) efficient approach based on the order in a list, use List.RemoveRange -- but you'll have to find the indexes.

class NameComparer : IComparer<Name>
{
    public int Compare(Name x, Name y) { 
        return x.First.CompareTo(y.First); 
    }
}
...
...
var comparer = new NameComparer();
var david = new Name { First = "David" };
int guess = list.BinarySearch(david, comparer);
int start, end;
start = list.FindLastIndex(guess, person => person.First != "David") + 1;
if (start > 0 && list[start].First == "David") {
    end = list.FindIndex(guess, person => person.First != "David");
    list.RemoveRange(start, end - start);
}
Jimmy
+1 You got RemoveAll a couple of minutes before me... I thought of it independently, but then searched to see if others had found it and here it is.
Mark Byers
pointing out nondestructive alternatives isn't ever a bad idea though :)
Jimmy
I will give it to this answer because this is what I have actually ended up using. Will look into performance differences but i'm guessing its all going to be pretty quick. Thanks to everyone who answered!
baron
actually after doing some testing, I'm convinced List.RemoveRange is actually pretty slow.
Jimmy
possible that whole method slower because you're doing two checks, Find first and last index then RemoveRange?
baron
yes, that is probably the reason. Binary search for finding the start index should speed up the performance for larger queries. I would hope that finding the last index should be pretty fast (it starts searching from startIndex)
Jimmy