ansaurus

Question

Comparing two vectors of maps

Answer 1

+4 A:

Something like the std::mismatch algorithm

You could also use std::set_difference

Glen 2009-08-12 11:58:29

don't forget about set_symmetric_difference().

D.Shawley 2009-08-12 12:07:09

Both of those look interesting to me. This might seem like a stupid question, but do `set_difference` and `set_symmetrics_difference` deal in sets? (i.e. Are repeated elements effectively ignored, so `a, a, b, c` is equivalent to `a, b, c`?)

Dominic Rodger 2009-08-12 12:11:11

It's not clear to me, but my reading of it is that std::set_difference requires a sorted set of data. i.e. no duplicates.

Glen 2009-08-12 12:20:25

Answer 2

A:

What exactly are you trying to achieve? Could you please define precisely what output you expect in terms of the input? Your pseudo code compares maps at the vector index. If that is not the correct semantics, then what is?

Ari 2009-08-12 12:16:21

I've edited the question to include a more fully-fleshed out example. Let me know if you want further clarification.

Dominic Rodger 2009-08-12 12:22:57

It seems like what you want is the so called "edit distance" between the vectors. Generally speaking, given two strings (in this case "strings" of map objects) there are many ways to transform one into the other using the basic operations "delete one character", "insert one character", and "change one character". The edit distance is the (lengtgh of) the shortest sequence of such operations that transforms one string to the other. Is this what you want? There are known algorithms for that.

Ari 2009-08-12 12:34:16

@Ari - great point about edit distance. I'd definitely not been thinking in the right terms - I'm looking for something like Levenshtein distances, but with details as to precisely *what* elements need to be changed.

Dominic Rodger 2009-08-12 12:46:35

The dynamic programming algo that computes the distance can also be made to give the sequence of operations realizing it, i.e., what elements need to changed/added/deleted.

Ari 2009-08-12 12:51:12

Answer 3

A:

Can you associate with each map some kind of checksum (or Blumen filter) - that at single check you could be able to decide if comparison has a sense.

Dewfy 2009-08-12 12:23:41

I can fairly easily tell if the two vectors are different, what I care about is precisely *how* they are different.

Dominic Rodger 2009-08-12 12:24:30

Answer 4

A:

In your example, note that is not possible to differentiate between

Extra b at index 1 of vector1, and vector2's c != vector1's d.

and

Extra b at index 1 of vector 1, extra d at index 2 of v1, and extra c at 1 in v2

because it is not clear that "c" shoud be compared to "d", it could be compared to "b" either. I assume the vectors are not sorted, because std::map doesn't provide a relational operator. Rather are the maps, which is as far as I see completly irrelevant ;-) So your example is slightly misreading. It could even be

Compare b f e a d

with a c f e

You can check each element of the first vector against each element of the second vector. This has quadratic runtime.

for i in 0... length(vector1):
    foundmatch = false;

    for j in 0... length(vector2):
        mismatch = false;
        for (k, v) in vector1[i]:
            if v != vector2[j][k]:
                mismatch = true;
                break; // no need to compare against the remaining keys.

        if (!mismatch) // found matching element j in vector2 for element i in vector1
            foundmatch = true;
            break;  // no need to compare against the remaining elements in vector2

    if (foundmatch)
        continue;
    else
        // report that vector1[i] has no matching element in vector2[]
        // "extra b at i"

If you want the find the missing elements, just swap vector1 and vector2.

If you want to check in a element in vector2 mismatches to a element in vector1 in only a single key, you have to add additional code around "no need to compare against the remainig keys".

drhirsch 2009-08-12 14:55:47

Answer 5

+1 A:

It sounds like you're looking for the diff algorithm. The idea is to identify the longest common subsequence of the two vectors (using map equality), then recurse down the non-common portions. Eventually you'll have an alternating list of vector sub-sequences that are identical, and sub-sequences that have no common elements. You can then easily produce whatever output you like from this.

Apply it to the two vectors, and there you go.

Note that since map comparison is expensive, if you can hash the maps (use a strong hash - collisions will result in incorrect output) and use the hashes for comparisons you'll save a lot of time.

Once you're down to the mismatched subsequences at the end, you'll have something like:

Input vectors: a b c d e f, a b c' d e f
Output:
   COMMON a b
   LEFT c
   RIGHT c'
   COMMON d e f

You can then individually compare the maps c and c' to figure out how they differ.

If you have a mutation and insertion next to each other, it gets more complex:

Input vectors: a b V W d e f, a b X Y d e f
Output:
   COMMON a b
   LEFT V W
   RIGHT X Y
   COMMON d e f

Determining whether to match V and W against X or Y (or not at all) is something you'll have to come up with a heuristic for.

Of course, if you don't care about how the content of the maps differ, then you can stop here, and you have the output you want.

bdonlan 2009-08-12 15:02:48

+1 - thanks, that's exactly what I'm trying to do. I'll take a look at the longest common subsequence problem.

Dominic Rodger 2009-08-12 21:45:02

ansaurus

tags:

views:

answers:

Comparing two vectors of maps

Example

Edit

related questions