views:

82

answers:

3

Hi there,

I have a collection of MyClass objects and i need to filter it by a combination of 17 fileds.

I implemented an object MyClassFilter with the 17 possible fields and the condition for each one and a method:

bool PassFilter(MyClass ObjectToEvaluate)
{
  return PassFilterVal(this.Workstream, ObjectToEvaluate.WorkStream)
    && PassFilterVal(this.AssignedTo, ObjectToEvaluate.AssignedTo)
    && PassFilterVal(this.ProcessingGroup, ObjectToEvaluate.ProcessingGroup)
    && PassFilterVal(this.ScheduledStart, ObjectToEvaluate.ScheduledStart)
    && PassFilterVal(this.EntityName, ObjectToEvaluate.EntityName)
    && PassFilterVal(this.TaskIDs, ObjectToEvaluate.TaskID)
    && PassFilterVal(this.ElementIDs, ObjectToEvaluate.EntityID)
    && PassFilterVal(this.MappingIDs, ObjectToEvaluate.MappingID)
    && PassFilterVal(this.EntityStatus, ObjectToEvaluate.EntityStatus)
    && PassFilterVal(this.EntityType, ObjectToEvaluate.EntityType)
    && PassFilterVal(this.NumberOfSteps, ObjectToEvaluate.ListOfSteps.Count)
    && PassFilterVal(this.NumberOfDependancies, ObjectToEvaluate.ListOfParentDependancies.Count)
    && PassFilterVal(this.NumberOfOpenIssues, ObjectToEvaluate.ListOfAllIssues.CountOpen)
    && PassFilterVal(this.NumberOfRequirementsLinked, ObjectToEvaluate.RequierementsLinked)
    ;
}

and my collection has a method

ListOfMyClass FilterList(MyClassFilter Filter){
    ListOfMyClass FilteredList = new ListOfMyClass();
    foreach (MyClass Task in this)
    {
      if (Filter.TaskPassFilter(Task))
        FilteredList.Add(Task);
    }
    return FilteredList;
}

It works fine as long as the collection is small but when I have 500 objects it start to be really slow. I have searched the net but all the examples are going object by object in the collection and asking filed by field if it pass or not.

Any suggestions as how to improve performance?

Thanks

A: 

Hmm.

I propose a cute trick for you:

Perhaps you could implement .CompareTo (or equivalent in whatever language; I'm assuming .NET) in such a way that "closer" matches are up the top. Then, you just use a collection that sorts on insert, and it'll all happen for you.

In this way, you could just loop over all the items, and as soon as you find one that doesn't match, stop, because you know everything below that doesn't match.

I'd be interested to see how that worked out. But maybe someone else will have a better suggestion (and I'm prepared to look like a fool if this is silly for some reason).

Noon Silk
I wait patiently for someone to explain why this is a bad idea. I'm interested to know.
Noon Silk
I didn't downvote you, but perhaps a code example? I mean, I get what you're saying, but I don't "get" it.
ranomore
Agreed with ranomore. Also, why is this better than the chained short-circuiting booleans in the OP (well, flexibility, probably)
Jimmy
+1  A: 

This shouldn't be slow, unless your comparisons are slow.

A scan of 500 objects should be very fast (of course you don't mention what "slow" is, or your hardward, but even still...).

Your PassFilterVal is going to be "more expensive" because of the method call than having the comparison in line, but since it's the same for all of them, I guess we're stuck with what we have.

You could order the parameters so the most selective ones are first.

The goal here is to leverage the short circuiting of the ANDs to dump as quickly as possible, thus limiting the amount of actual comparisons.

Another thing you can do is optimize it for the "most common queries" first.

Are ALL of the criteria always used? If not, you should limit the comparisons to those that are actually being used. Equality in this case is actually expensive (17 method calls and 17 comparisons of "unknown" complexity). If you have some kind of wildcard, or "don't care" value, you can try and skip those from being compared at all.

Another idea is to sort the elements by all 17 criteria. Then you use a binary search for the element that matches all 17 fields, and finally iterate for the remaining elements until they STOP matching your criteria. Of course you need to always keep the list properly sorted, but once sorted, it's a binary insert, which will be pretty fast. If you read a lot more than you add to the list, this should be a net gain.

Will Hartung
A: 

Without more context it is effectively impossible to suggest why performance is so slow. I can however offer some clues. If things get slow at around 500 items, there is probably an O(N^2) algorithm (or worse) lurking in there somewhere. I would guess that one, or more, of your properties is traversing a large set of data during each comparison.

With properties in C#, it is hard to know how they are implemented, e.g. something innocous like NumberOfDependancies could be traversing a very large graph each time it is called. Or it may be generating a list to count the dependencies. If possible I would compute these values once, and store them in the class (if possible). However, when I see the context in which it is used, I see another problem:

PassFilterVal(this.NumberOfDependancies, ObjectToEvaluate.ListOfParentDependancies.Count

If "ListOfParentDependancies" is an IEnumerable<> then you are going to be traversing the list of all dependencies each time you call it, to count the number.

Your filter function, in terms of performance is fine. For elegance, and a possible modest boost in performance I would implement it as follows

IEnumerable<MyClass> FilterList(MyClass filter) {
    foreach (MyClass task in this)
       if (filter.TaskPassFilter(task))
         yield return task;
}
cdiggins