views:

140

answers:

4

I have a set of A's and a set of B's, each with an associated numerical priority, where each A may match some or all B's and vice versa, and my main loop basically consists of:

Take the best A and B in priority order, and do stuff with A and B.

The most obvious way to do this is with a single priority queue of (A,B) pairs, but if there are 100,000 A's and 100,000 B's then the set of O(N^2) pairs won't fit in memory (and disk is too slow).

Another possibility is for each A, loop through every B. However this means that global priority ordering is by A only, and I really need to take priority of both components into account.

(The application is theorem proving, where the above options are called the pair algorithm and the given clause algorithm respectively; the shortcomings of each are known, but I haven't found any reference to a good solution.)

Some kind of two layer priority queue would seem indicated, but it's not clear how to do this without using either O(N^2) memory or O(N^2) time in the worst case.

Is there a known method of doing this?

Clarification: each A must be processed with all corresponding B's, not just one.

+1  A: 

Maybe there's something I'm not understanding but,

Why not keep the A's and B's in separate heaps, get_Max on each of the heaps, do your work, remove each max from its associated heap and continue?

Jason Punyon
A: 

So you have a Set of A's, and a set of B's, and you need to pick a (A, B) pair from this set such that some f(a, b) is the highest of any other (A, B) pair.

This means you can either store all possible (A, B) pairs and order them, and just pick the highest each time through the loop (O(1) per iteration but O(N*M) memory).

Or you could loop through all possible pairs and keep track of the current maximum and use that (O(N*M) per iteration, but only O(N+M) memory).

If I am understanding you correctly this is what you are asking.

I think it very much depends on f() to determine if there is a better way to do it.

If f(a, b) = a + b, then it is obviously very simple, the highest A, and the highest B are what you want.

Greg Rogers
f(a,b) is simple, max(a,b) is a tolerable first approximation.But how to perform many iterations without using O(N*M) time or memory?
rwallace
+1  A: 

You could handle the best pairs first, and if nothing good comes up mop up the rest with the given clause algorithm for completeness' sake. This may lead to some double work, but I'd bet that this is insignificant.

Have you considered ordered paramodulation or superposition?

starblue
Those are both on the to-do list, but I think I need to nail the basics down first. Right now I'm thinking in terms of using a series of pair queues, one per bucket.
rwallace
A: 

I think your original idea will work, you just need to keep your As and Bs in separate collections and just stick references to them in your priority queue. If each reference takes 16 bytes (just to pick a number), then 10,000,000 A/B references will only take ~300M. Assuming your As and Bs themselves aren't too big, it should be workable.

TMN