views:

48

answers:

4

Hi:

Does anyone know any good data structure and algorithm for searching with multiple predicate.

eg. suppose I have a set of tcp header data (assuming no duplicate). If I were searching for an tcp header of the list by src ip, I could sort the set by src IP and do binary search.

What kind of data structure/algorithm should I use if I want to find a tcp header from the set that matches all of src/dst ip/port? (besides iterating through all of the set)

A: 

Perhaps you could apply kd-trees as a means of obtaining efficient multi-key searching? I can't claim to know much about the specific problem you are trying to solve, but asking about searching in terms of multiple keys, it seems like it could be applicable.

Gian
A: 

In C++ Boost provides something called a multi-index container. It's essentially a group of hash tables, one for each key, with some code to keep them consistent.

Joel
+1  A: 

This is exactly the sort of thing database vendors have dealt with for years. If you're going to consistently search by src/dst IP/port, you can use that as a criteria for a sort, and look for it more or less directly.

Otherwise, the typical approach is to sort the data by one field, and build indices for the other fields. You can then do a binary search in each index to find the set of records that fits the criteria for that field. The intersection of those sets will be the records you're looking for.

Of course, if you prefer, you can also reduce the number of indices, so (for example) you might use indices to get a set of records with the right source and destination IPs, and then just scan through that (probably fairly small) set to get those with the right port number.

Jerry Coffin
+1  A: 

I would suggest indexing individually on common fields, then using a merge join strategy to satisfy queries for multiple fields.

You can also use an index for (a, b, c) to query for (a, b) or just (a), so a judicious selection of indexes may allow you to avoid the need to merge join.

Nick Johnson