views:

390

answers:

5

Can you suggest me an algorithm for filtering out data.

I am using javascript and trying to write out a filter function which filters an array of data.I have an array of data and an array of filters, so in order to apply each filter on every data, I have written 2 for loops

foreach(data)
{
  foreach(filter)
  {
   check data with filter
  }
}

this is not the proper code, but in short that what my function does, the problem is this takes a huge amount of time, can someone suggest a better method.

I am using the Mootools library and the array of data is JSON array

Details of data and Filter

Data is JSON array of lets say user, so it will be

data = [{"name" : "first", "email" :  "first@first", "age" : "20"}.
        {"name" : "second", "email" :  "second@second", "age" : "21"}
        {"name" : "third", "email" :  "third@third", "age" : "22"}]

Array of filters is basically self define class for different fields of data

alFilter[0] = filterName;
alFilter[1] = filterEmail;
alFilter[2] = filterAge;

So when i enter the first for loop, I get a single JSON opbject (first row) in the above case When i enter the second for loop (filters loop) I have a filter class which extracts the exact field on which the current filter would work and check the filter with the appropriate field of the data.

So in my example

foreach(data)
{
 foreach(filter)
{
  //loop one - filter name
 // loop two - filter email
 // loop three - filter age
}
}

when the second loop ends i set a flag denoting if the data has been filtered or not and depending on it the data is displayed.

+2  A: 

You're going to have to give us some more detail about the exact structure of your data and filters to really be able to help you out. Are the filters being used to select a subset of data, or to modify the data? What are the filters doing?

That said, there are a few general suggestions:

  1. Do less work. Is there some way you can limit the amount of data you're working on? Some pre-filter that can run quickly and cut it down before you do your main loop?
  2. Break out of the inner loop as soon as possible. If one of the filters rejects a datum, then break out of the inner loop and move on to the next datum. If this is possible, then you should also try to make the most selective filters come first. (This is assuming that your filters are being used to reject items out of the list, rather than modify them)
  3. Check for redundancy in the computation the filters perform. If each of them performs some complicated calculations that share some subroutines, then perhaps memoization or dynamic programming may be used to avoid redundant computation.

Really, it all boils down to the first point, do less work, at all three levels of your code. Can you do less work by limiting the items in the outer loop? Do less work by stopping after a particular filter and doing the most selective filters first? Do less work by not doing any redundant computation inside of each filter?

Brian Campbell
+2  A: 

That's pretty much how you should do it. The trick is to optimize that "check data with filter"-part. You need to traverse all your data and check against all your filters - you'll not going to get any faster than that.

Avoid string comparisons, use data models as native as possible, try to reduce the data set on each pass with filter, etc.

Without further knowledge, it's hard to optimize this for you.

Henrik Paul
+1  A: 

You should sort the application of your filters, so that two things are optimized: expensive checks should come last, and checks that eliminate a lot of data should come first. Then, you should make sure that checking is cut short as soon as an "out" result occurs.

Svante
A: 

Supposing that a filter removes the data if it doesn't match, I suggest, that you switch the two loops like so:

foreach(filter) {
    foreach(data) {
        check data with filter
    }
}

By doing so, the second filter doesn't have to work all data, but only the data that passed the first filter, and so on. Of course the tips above (like doing expensive checks last) are still true and should additionally be considered.

Ridcully
seems worth trying will let you know if done
Undefined
Should work. Say you've 1000 users. When the first filter removes 500 of them, that's 500 less you'll need to apply the second filter on, and so forth.
Ridcully
A: 

If your filters are looking for specific values, a range, or start of a text then jOrder (http://github.com/danstocker/jorder) will fit your problem.

All you need to do is create a jOrder table like this:

var table = jOrder(data)
    .index('name', ['name'], { grouped: true, ordered: true })
    .index('email', ['email'])
    .index('age', ['age'], { grouped: true, ordered: true, type: jOrder.number });

And then call table.where() to filter the table.

When you're looking for exact matches:

filtered = table.where([{name: 'first'}, {name: 'second'}]);

When you're looking for a certain range of one field:

filtered = table.where([{age: {lower: 20, upper: 21}}], {mode: jOrder.range});

Or, when you're looking for values starting with a given string:

filtered = table.where([{name: 'fir'}], {mode: jOrder.startof});

Filtering will be magnitudes faster this way than with nested loops.

Dan Stocker