views:

154

answers:

5

I am looking for an algorithm to merge two sorted lists, but they lack a comparison operator between elements of one list and elements of the other. The resulting merged list may not be unique, but any result which satisfies the relative sort order of each list will do. More precisely:

Given:

  • Lists A = {a_1, ..., a_m}, and B = {b_1, ..., b_n}. (They may be considered sets, as well).
  • A precedence operator < defined among elements of each list such that a_i < a_{i+1}, and b_j < b_{j+1} for 1 <= i <= m and 1 <= j <= n.
  • The precedence operator is undefined between elements of A and B: a_i < b_j is not defined for any valid i and j.
  • An equality operator = defined among all elements of either A or B (it is defined between an element from A and an element from B).
  • No two elements from list A are equal, and the same holds for list B.

Produce: A list C = {c_1, ..., c_r} such that:

  • C = union(A, B); the elements of C are the union of elements from A and B.
  • If c_p = a_i, c_q = a_j, and a_i < a_j, then c_p < c_q. (The order of elements of the sublists of C corresponding to sets A and B should be preserved.
  • There exist no i and j such that c_i = c_j. (all duplicated elements between A and B are removed).

I hope this question makes sense and that I'm not asking something either terribly obvious, or something for which there is no solution.

Context:

A constructible number can be represented exactly in finitely many quadratic extensions to the field of rational numbers (using a binary tree of height equal to the number of field extensions). A representation of a constructible number must therefore "know" the field it is represented in. Lists A and B represent successive quadratic extensions of the rational numbers. Elements of A and B themselves are constructible numbers, which are defined in the context of previous smaller fields (hence the precedence operator). When adding/multiplying constructible numbers, the quadratically extended fields must first be merged so that the binary arithmetic operations can be performed; the resulting list C is the quadratically extended field which can represent numbers representable by both fields A and B. (If anyone has a better idea of how to programmatically work with constructible numbers, let me know. A question concerning constructible numbers has arisen before, and also here are some interesting responses about their representation.)

Before anyone asks, no, this question does not belong on mathoverflow; they hate algorithm (and generally non-graduate-level math) questions.

In practice, lists A and B are linked lists (stored in reverse order, actually). I will also need to keep track of which elements of C corresponded to which in A and B, but that is a minor detail. The algorithm I seek is not the merge operation in mergesort, because the precedence operator is not defined between elements of the two lists being merged. Everything will eventually be implemented in C++ (I just want the operator overloading). This is not homework, and will eventually be open sourced, FWIW.

+2  A: 

This sounds like it would use a degenerate form of topological sorting.

EDIT 2:

Now with a combined routine:

import itertools

list1 = [1, 2, 'c', 4, 5, 'f', 7]
list2 = ['a', 'b', 'c', 'd', 'e', 'f', 'g']

ibase = 0
result = []

for n1, i1 in enumerate(list1):
  for n2, i2 in enumerate(itertools.islice(list2, ibase, None, 1)):
    if i1 == i2:
      result.extend(itertools.islice(list2, ibase, ibase + n2))
      result.append(i2)
      ibase = n2 + ibase + 1
      break
  else:
    result.append(i1)
result.extend(itertools.islice(list2, ibase, None, 1))

print result
Ignacio Vazquez-Abrams
+1  A: 

Would concatenating the two lists be sufficient? It does preserve the relative sortedness of elements from a and elements from b.

Then it's just a matter of removing duplicates.

EDIT: Alright, after the comment discussion (and given the additional condition that a_i=b_i & a_j=b_j & a_i<a_j => b_i<b-J), here's a reasonable solution:

  1. Identify entries common to both lists. This is O(n2) for the naive algorithm - you might be able to improve it.

  2. (optional) verify that the common entries are in the same order in both lists.

  3. Construct the result list: All elements of a that are before the first shared element, followed by all elements of b before the first shared element, followed by the first shared element, and so on.

Anon.
Suppose my lists are "1, 2, c, 4, 5, f", and "a, b, c, d, e, f". Concatenating would make "1, 2, c, 4, 5, f, a, b, c, d, e, f". Now, both 'c' and 'f' are duplicated, so some reshuffling has to be done; in that "4, 5, d, e" must all lie between wherever 'c and 'f' end up. Could you propose a concrete way of making this work?
Victor Liu
Suppose we have the lists `1, 2, 3, 4` and `4, 5, 6, 1`. What should the result be?
Anon.
@Anon.: Such a case is un-satisfiable, and will not arise in practice. Ideally, this kind if cyclic condition can be detected and an error is raised.
Victor Liu
"some reshuffling has to be done; in that "4, 5, d, e" must all lie between wherever 'c and 'f' end up." - In that case, maybe make two lists of linked lists. The first will be ordered sequences from a that end with an element that is also in b. The second will be ordered sequences from b that start with an element that is also in a. Zip together these two lists of lists by merging the identical end points.
Justin Smith
+1  A: 

Given the problem as you have expressed it, I have a feeling that the problem may have no solution. Suppose that you have two pairs of elements {a_1, b_1} and {a_2, b_2} where a_1 < a_2 in the ordering of A, and b_1 > b_2 in the ordering of B. Now suppose that a_1 = b_1 and a_2 = b_2 according to the equality operator for A and B. In this scenario, I don't think you can create a combined list that satisfies the sublist ordering requirement.

Anyway, there's an algorithm that should do the trick. (Coded in Java-ish ...)

List<A> alist = ...
List<B> blist = ...
List<Object> mergedList = new SomeList<Object>(alist);
int mergePos = 0;
for (B b : blist) {
    boolean found = false;
    for (int i = mergePos; i < mergedList.size(); i++) {
        if (equals(mergedList.get(i), b)) {
            found = true; break;
        }
    }
    if (!found) {
        mergedList.insertBefore(b, mergePos);
        mergePos++;
    }
}

This algorithm is O(N**2) in the worst case, and O(N) in the best case. (I'm skating over some Java implementation details ... like combining list iteration and insertion without a major complexity penalty ... but I think it can be done in this case.)

The algorithm neglects the pathology I mentioned in the first paragraph and other pathologies; e.g. that an element of B might be "equal to" multiple elements of A, or vice versa. To deal with these, the algorithm needs to check each b against all elements of the mergedList that are not instances of B. That makes the algorithm O(N**2) in the best case.

Stephen C
I don't think your bad case can happen - when the questioner says "a precedence operator" in his second bullet point, I *think* this is intended to mean there's a partial order on the union of A and B, which is a complete when restricted to A and also complete when restricted to B. In other words, if there are equal elements between the lists, then the order of those elements is the same in each list. I think.
Steve Jessop
@Steve - maybe. But I'm answering the question **as stated**.
Stephen C
I don't think you are. The question states that there is *one* precedence operator. For your bad case you have assumed that there are two elements a, b, and *two* precedence operators <A and <B, such that a <A b, and b <B a. I think that the questioner means what he said, that there is *one* order, but you may well be correct that there are two, and that it is therefore possible for them to be inconsistent.
Steve Jessop
@Steve - the question does not state there is one operator. It says there is "a precedence operator among the elements of each list" ... which I take to mean that there is an operator for each list. Then the next bullet point that defines a separate operator between the lists. But either way (whether there are 1 2 or 3 operators), there is nothing to state that there is a consistent ordering between elements of A and B ... let alone that this ordering can be derived from the operators provided.
Stephen C
Not that it really matters, since from a comment to Anon we do know that inconsistent orders "will not arise in practice".
Steve Jessop
And btw, I parsed the question as "(1) There is a precedence operator; (2) defined on pairs drawn from each list; (3) undefined on pairs drawn one from each list". Not that there are three precedence operators, one of which is defined on nothing (the trivial relation?) ;-)
Steve Jessop
+1  A: 

I don't think you can do it better than O(N*M), although I'd be happy to be wrong.

That being the case, I'd do this:

  • Take the first (remaining) element of A.
  • Look for it in (what's left of) B.
    • If you don't find it in B, move it to the output
    • If you do find it in B, move everything from B up to and including the match, and drop the copy from A.
  • Repeat the above until A is empty
  • Move anything left in B to the output

If you want to detect incompatible orderings of A and B, then remove "(what's left of)" from step 2. Search the whole of B, and raise an error if you find it "too early".

The problem is that given a general element of A, there is no way to look for it in B in better than linear time (in the size of B), because all we have is an equality test. But clearly we need to find the matches somehow and (this is where I wave my hands a bit, I can't immediately prove it) therefore we have to check each element of A for containment in B. We can avoid a bunch of comparisons because the orders of the two sets are consistent (at least, I assume they are, and if not there's no solution).

So, in the worst case the intersection of the lists is empty, and no elements of A are order-comparable with any elements of B. This requires N*M equality tests to establish, hence the worst-case bound.

For your example problem A = (1, 2, c, 4, 5, f), B = (a, b, c, d, e, f), this gives the result (1,2,a,b,c,4,5,d,e,f), which seems good to me. It performs 24 equality tests in the process (unless I can't count): 6 + 6 + 3 + 3 + 3 + 3. Merging with A and B the other way around would yield (a,b,1,2,c,d,e,4,5,f), in this case with the same number of comparisons, since the matching elements just so happen to be at equal indices in the two lists.

As can be seen from the example, the operation can't be repeated. merge(A,B) results in a list with an order inconsistent with that of merge(B,A). Hence merge((merge(A,B),merge(B,A)) is undefined. In general, the output of a merge is arbitrary, and if you go around using arbitrary orders as the basis of new complete orders, you will generate mutually incompatible orders.

Steve Jessop
A: 

If the elements are hashable, this can be done in O(N) time where N is the total number of elements in A and B.

def merge(A, B):
    # Walk A and build a hash table mapping its values to indices.
    amap = {}
    for i, a in enumerate(A):
        amap[a] = i

    # Now walk B building C.
    C = []
    ai = 0
    bi = 0
    for i, b in enumerate(B):
        if b in amap:
            # b is in both lists.
            new_ai = amap[b]
            assert new_ai >= ai  # check for consistent input
            C += A[ai:new_ai]    # add non-shared elements from A
            C += B[bi:i]         # add non-shared elements from B
            C.append(b)          # add the shared element b
            ai = new_ai + 1
            bi = i + 1
    C += A[ai:]  # add remaining non-shared elements from A
    C += B[bi:]  # from B
    return C

A = [1, 2, 'c', 4, 5, 'f', 7]
B = ['a', 'b', 'c', 'd', 'e', 'f', 'g']
print merge(A, B)

(This is just an implementation of Anon's algorithm. Note that you can check for inconsistent input lists without hurting performance and that random access into the lists is not necessary.)

Jason Orendorff