ansaurus

Question

Algorithm to merge two lists lacking comparison between them

Answer 1

+2 A:

This sounds like it would use a degenerate form of topological sorting.

EDIT 2:

Now with a combined routine:

import itertools

list1 = [1, 2, 'c', 4, 5, 'f', 7]
list2 = ['a', 'b', 'c', 'd', 'e', 'f', 'g']

ibase = 0
result = []

for n1, i1 in enumerate(list1):
  for n2, i2 in enumerate(itertools.islice(list2, ibase, None, 1)):
    if i1 == i2:
      result.extend(itertools.islice(list2, ibase, ibase + n2))
      result.append(i2)
      ibase = n2 + ibase + 1
      break
  else:
    result.append(i1)
result.extend(itertools.islice(list2, ibase, None, 1))

print result

Ignacio Vazquez-Abrams 2010-01-25 00:24:35

Answer 2

+1 A:

Would concatenating the two lists be sufficient? It does preserve the relative sortedness of elements from a and elements from b.

Then it's just a matter of removing duplicates.

EDIT: Alright, after the comment discussion (and given the additional condition that a_i=b_i & a_j=b_j & a_i<a_j => b_i<b-J), here's a reasonable solution:

Identify entries common to both lists. This is O(n²) for the naive algorithm - you might be able to improve it.
(optional) verify that the common entries are in the same order in both lists.
Construct the result list: All elements of a that are before the first shared element, followed by all elements of b before the first shared element, followed by the first shared element, and so on.

Anon. 2010-01-25 00:26:46

Suppose my lists are "1, 2, c, 4, 5, f", and "a, b, c, d, e, f". Concatenating would make "1, 2, c, 4, 5, f, a, b, c, d, e, f". Now, both 'c' and 'f' are duplicated, so some reshuffling has to be done; in that "4, 5, d, e" must all lie between wherever 'c and 'f' end up. Could you propose a concrete way of making this work?

Victor Liu 2010-01-25 00:29:50

Suppose we have the lists `1, 2, 3, 4` and `4, 5, 6, 1`. What should the result be?

Anon. 2010-01-25 00:32:15

@Anon.: Such a case is un-satisfiable, and will not arise in practice. Ideally, this kind if cyclic condition can be detected and an error is raised.

Victor Liu 2010-01-25 00:40:32

"some reshuffling has to be done; in that "4, 5, d, e" must all lie between wherever 'c and 'f' end up." - In that case, maybe make two lists of linked lists. The first will be ordered sequences from a that end with an element that is also in b. The second will be ordered sequences from b that start with an element that is also in a. Zip together these two lists of lists by merging the identical end points.

Justin Smith 2010-01-25 00:49:28

Answer 3

+1 A:

Given the problem as you have expressed it, I have a feeling that the problem may have no solution. Suppose that you have two pairs of elements {a_1, b_1} and {a_2, b_2} where a_1 < a_2 in the ordering of A, and b_1 > b_2 in the ordering of B. Now suppose that a_1 = b_1 and a_2 = b_2 according to the equality operator for A and B. In this scenario, I don't think you can create a combined list that satisfies the sublist ordering requirement.

Anyway, there's an algorithm that should do the trick. (Coded in Java-ish ...)

List<A> alist = ...
List<B> blist = ...
List<Object> mergedList = new SomeList<Object>(alist);
int mergePos = 0;
for (B b : blist) {
    boolean found = false;
    for (int i = mergePos; i < mergedList.size(); i++) {
        if (equals(mergedList.get(i), b)) {
            found = true; break;
        }
    }
    if (!found) {
        mergedList.insertBefore(b, mergePos);
        mergePos++;
    }
}

This algorithm is O(N**2) in the worst case, and O(N) in the best case. (I'm skating over some Java implementation details ... like combining list iteration and insertion without a major complexity penalty ... but I think it can be done in this case.)

The algorithm neglects the pathology I mentioned in the first paragraph and other pathologies; e.g. that an element of B might be "equal to" multiple elements of A, or vice versa. To deal with these, the algorithm needs to check each b against all elements of the mergedList that are not instances of B. That makes the algorithm O(N**2) in the best case.

Stephen C 2010-01-25 01:09:45

I don't think your bad case can happen - when the questioner says "a precedence operator" in his second bullet point, I *think* this is intended to mean there's a partial order on the union of A and B, which is a complete when restricted to A and also complete when restricted to B. In other words, if there are equal elements between the lists, then the order of those elements is the same in each list. I think.

Steve Jessop 2010-01-25 01:16:25

@Steve - maybe. But I'm answering the question **as stated**.

Stephen C 2010-01-25 01:21:26

I don't think you are. The question states that there is *one* precedence operator. For your bad case you have assumed that there are two elements a, b, and *two* precedence operators <A and <B, such that a <A b, and b <B a. I think that the questioner means what he said, that there is *one* order, but you may well be correct that there are two, and that it is therefore possible for them to be inconsistent.

Steve Jessop 2010-01-25 01:32:10

@Steve - the question does not state there is one operator. It says there is "a precedence operator among the elements of each list" ... which I take to mean that there is an operator for each list. Then the next bullet point that defines a separate operator between the lists. But either way (whether there are 1 2 or 3 operators), there is nothing to state that there is a consistent ordering between elements of A and B ... let alone that this ordering can be derived from the operators provided.

Stephen C 2010-01-25 01:54:52

Not that it really matters, since from a comment to Anon we do know that inconsistent orders "will not arise in practice".

Steve Jessop 2010-01-25 02:10:05

And btw, I parsed the question as "(1) There is a precedence operator; (2) defined on pairs drawn from each list; (3) undefined on pairs drawn one from each list". Not that there are three precedence operators, one of which is defined on nothing (the trivial relation?) ;-)

Steve Jessop 2010-01-25 02:13:06

Answer 4

+1 A:

I don't think you can do it better than O(N*M), although I'd be happy to be wrong.

That being the case, I'd do this:

Take the first (remaining) element of A.
Look for it in (what's left of) B.
- If you don't find it in B, move it to the output
- If you do find it in B, move everything from B up to and including the match, and drop the copy from A.
Repeat the above until A is empty
Move anything left in B to the output

If you want to detect incompatible orderings of A and B, then remove "(what's left of)" from step 2. Search the whole of B, and raise an error if you find it "too early".

The problem is that given a general element of A, there is no way to look for it in B in better than linear time (in the size of B), because all we have is an equality test. But clearly we need to find the matches somehow and (this is where I wave my hands a bit, I can't immediately prove it) therefore we have to check each element of A for containment in B. We can avoid a bunch of comparisons because the orders of the two sets are consistent (at least, I assume they are, and if not there's no solution).

So, in the worst case the intersection of the lists is empty, and no elements of A are order-comparable with any elements of B. This requires N*M equality tests to establish, hence the worst-case bound.

For your example problem A = (1, 2, c, 4, 5, f), B = (a, b, c, d, e, f), this gives the result (1,2,a,b,c,4,5,d,e,f), which seems good to me. It performs 24 equality tests in the process (unless I can't count): 6 + 6 + 3 + 3 + 3 + 3. Merging with A and B the other way around would yield (a,b,1,2,c,d,e,4,5,f), in this case with the same number of comparisons, since the matching elements just so happen to be at equal indices in the two lists.

As can be seen from the example, the operation can't be repeated. merge(A,B) results in a list with an order inconsistent with that of merge(B,A). Hence merge((merge(A,B),merge(B,A)) is undefined. In general, the output of a merge is arbitrary, and if you go around using arbitrary orders as the basis of new complete orders, you will generate mutually incompatible orders.

Steve Jessop 2010-01-25 01:23:52

Answer 5

A:

If the elements are hashable, this can be done in O(N) time where N is the total number of elements in A and B.

def merge(A, B):
    # Walk A and build a hash table mapping its values to indices.
    amap = {}
    for i, a in enumerate(A):
        amap[a] = i

    # Now walk B building C.
    C = []
    ai = 0
    bi = 0
    for i, b in enumerate(B):
        if b in amap:
            # b is in both lists.
            new_ai = amap[b]
            assert new_ai >= ai  # check for consistent input
            C += A[ai:new_ai]    # add non-shared elements from A
            C += B[bi:i]         # add non-shared elements from B
            C.append(b)          # add the shared element b
            ai = new_ai + 1
            bi = i + 1
    C += A[ai:]  # add remaining non-shared elements from A
    C += B[bi:]  # from B
    return C

A = [1, 2, 'c', 4, 5, 'f', 7]
B = ['a', 'b', 'c', 'd', 'e', 'f', 'g']
print merge(A, B)

(This is just an implementation of Anon's algorithm. Note that you can check for inconsistent input lists without hurting performance and that random access into the lists is not necessary.)

Jason Orendorff 2010-01-25 15:41:37

ansaurus

tags:

views:

answers:

Algorithm to merge two lists lacking comparison between them

related questions