views:

1321

answers:

5

Say I have two lists or arrays of strings. For example:

list 1: "a", "c", "b", "d", "f", "e"

list 2: "a", "d", "e", "f", "h"

List 1 and list 2 are of arbitrary lengths. List 1 may contain elements not in list 2, and visa versa.

I'd like to know when items in list 1 are found in list 2, and more specifically I want to know when an item in list 1 is found in list 2 but is found in a different order than which it is found in list 2, relative to the items in list 2. (Hopefully the example below clarifies this statement).

For example, "a" is found in both lists and is the first item in both lists. So, everything is okay so far. "c" and "b" are found in only the 1st list and so can be ignored. "h" is found in only the 2nd list and can also therefore be ignored.

"d" is found in both the 1st and 2nd list. It is found after "a" (the first item) in the original list. Even though the position within the 1st list is different form the second list, it is is okay because its relative order is the same within the two lists (the second match in common between lists).

In the example above "f" and "e" are in the 'wrong' order in list 1, because "e" comes becomes before "f" in the 2nd list. So, I would like to report that "e" and "f" are in the wrong order in the first list. How would I do this?

The solution should be in C#. Thank you!

+1  A: 

I don't have code, but this should be the two main steps:

  • Remove every item that is in one list only (either shorten one of the lists or create a third "clean" list)
  • Some kind of diff - search for the first mismatching item and then report every following item until both lists are "synchronized" again.

Perhaps it will even help to clean up both of the lists. Then you can use a pointer for each list, set it to the first item and increase them until there's a mismatch.

schnaader
+2  A: 

string[] list1 = {"a", "c", "b", "d", "f", "e"};
string[] list2 = {"a", "d", "e", "f", "h"};
int i = 0;
var list1a = list1.Intersect(list2).Select(l=> new { item = l, order= i++});
int j = 0;
var list2a = list2.Intersect(list1).Select(l=> new { item = l, order= j++});

var r = from l1 in list1a join l2 in list2a on l1.order equals l2.order
     where l1.item != l2.item
     select new {result=string.Format("position {1} is item [{0}] on list1  but its [{2}] in list2", l1.item, l1.order, l2.item )};

r.Dump();

result

position 2 is item [f] on list1 but its [e] in list2

position 3 is item [e] on list1 but its [f] in list2

ehosca
The way I understand this, is it only compares source sequences of equal length and corresponding elements (at equal indexes). In my case, my two sequences have different lengths and may be considered a match (or OK at any rate) even if elements with different indexes are different, providing the sequences are the same.
i noticed you weren't interested in comparing the sequences... the query above will produce the desired results .. try running in in LinqPad...
ehosca
Very nice! I have a lot to learn about Linq.
+1  A: 

How about

list1.Intersect(list2).SequenceEquals(list2.Intersect(list1))
Vasu Balakrishnan
I see my problem description wasn't very clear at all. LBushkin is right in that describing my expected results would've been helpful. (And now I understand why both you and ehosca suggested SequenceEquals -- the assumption being I just want a true/false answer). But the idea of using Intersect will certainly cut down on some of the code I've written so thank you very much!
A: 

How about this:

string[] list1 = { "a", "c", "b", "d", "f", "e" };
string[] list2 = { "a", "d", "e", "f", "h" };

var indexedList1 = list1.Select((x, i) => new
{
    Index = i,
    Item = x
});

var indexedList2 = list2.Select((x, i) => new
    {
     Index = i,
     Item = x
    });

var intersectedWithIndexes = indexedList2
    .Join(indexedList1,
          x => x.Item,
          y => y.Item,
          (x, y) => new
     {
      ExpectedIndex = x.Index,
      ActualIndex = y.Index,
      x.Item
     })
    .Where(x => x.ActualIndex != x.ExpectedIndex)
    .ToArray();

var outOfOrder = intersectedWithIndexes
    .Select((x, i) => new
     {
      Item = x,
      index = i
     })
    .Skip(1)
    .Where(x => x.Item.ActualIndex < intersectedWithIndexes[x.index - 1].ActualIndex ||
         x.Item.ExpectedIndex < intersectedWithIndexes[x.index - 1].ExpectedIndex)
    .Select(x => new
     {
      ExpectedBefore = x.Item,
      ExpectedAfter = intersectedWithIndexes[x.index - 1]
     });

foreach (var item in outOfOrder)
{
    Console.WriteLine("'{0}' and '{1}' are out of order at index {2}",
        item.ExpectedBefore.Item,
        item.ExpectedAfter.Item,
        item.ExpectedBefore.ActualIndex);
}

output:

'f' and 'e' are out of order at index 4
Handcraftsman
Yes, that works too. Thank you as well!
+1  A: 

Levenshtein distance? I think the first solution that you have already accepted has a flaw. It will tell you everything is out of order even if one little thing is inserted:

      string[] list1 = { "a", "c", "b", "d", "j", "e", "f" };
      string[] list2 = { "a", "d", "e", "f", "h", "j" };

that says that j, e, f are out of order, because the j was inserted.

This points out the problem you face. There are multiple solutions, even more than one optimal solution, to the problem of what is out of order. Is the J in order or the e and the f? Are they all out of order? There is something called the Levenshtein distance algorithm that finds a minimal number of inserts and delete operations necessary to start with set A and end up with set B. There are multiple best solutions, this just finds one of them.

This following algorithm correctly outputs that in list1, j was inserted and e, f are shifted but still in the correct order.

using System;
using System.Collections.Generic;
using System.Text;
using System.Collections;
using Math = System.Math;

namespace LevCompareLists {

  class Program {

    static void Main(string[] args) {

      string[] list1 = { "a", "c", "b", "d", "j", "e", "f" };
      string[] list2 = { "a", "d", "e", "f", "h", "j" };

      int?[] aMap21 = Levenshtein(list2, list1);
      int?[] aMap12 = Levenshtein(list1, list2);

    }

    public static int?[] Levenshtein(string[] Src, String[] Dst) {
      // this finds a minimum difference solution of inserts and deletes that maps Src to Dst
      // it returns the map from the perspective of Dst, i.e.:
      //   each element of the return array contains the Src index of the corresponging element in B 
      //   a null value means the element in B was inserted, and never existed in A
      //
      // Given A = {"a", "c", "b", "d", "j", "e", "f"}
      //       B = {"a", "d", "e", "f", "h", "j"};
      //
      // Levenshtein(B, A):
      //  a c b d j e f     <-- A
      //  0     1   2 3     <-- aMap
      //  a     d   e f h j <-- B
      //
      // Levenshtein(A, B):
      //  a     d   e f h j  <-- B   
      //  0     3   5 6      <-- aMap
      //  a c b d j e f      <-- A
      //
      // see: http://en.wikipedia.org/wiki/Levenshtein_distance

      int cSrc = Src.Length; //length of s
      int cDst = Dst.Length; //length of t
      if (cSrc == 0 || cDst == 0) return null;

      //**** create the Levenshtein matrix 
      // it has at 1 extra element in each dimension to contain the edges
      int[,] aLev = new int[cSrc + 1, cDst + 1]; // the matrix
      int iSrc, iDst;
      // Load the horizontal and vertical edges
      for (iSrc = 0; iSrc <= cSrc; aLev[iSrc, 0] = iSrc++) ;
      for (iDst = 0; iDst <= cDst; aLev[0, iDst] = iDst++) ;
      // load the interior
      for (iSrc = 1; iSrc <= cSrc; iSrc++)
        for (iDst = 1; iDst <= cDst; iDst++)
          aLev[iSrc, iDst] = Math.Min(Math.Min(aLev[iSrc - 1, iDst] + 1, aLev[iSrc, iDst - 1] + 1),
           aLev[iSrc - 1, iDst - 1] + ((Dst[iDst - 1] == Src[iSrc - 1]) ? 0 : 2));

      DumpLevMatrix(aLev, Src, Dst);  // Debug

      //**** create the return map, using the Levenshtein matrix
      int?[] aMap = new int?[cDst];  // this is the return map
      iSrc = cSrc;  // start in lower right corner of the Levenshtein matrix
      iDst = cDst;  // start in lower right corner of the Levenshtein matrix
      // work backwards to pick best solution
      while ((iSrc >= 0) || (iDst >= 0)) {
        if ((iSrc > 0) && (iDst > 0)) {
          // enter here if iSrc and iDst are in the lev matrix and not on its edge
          int nCur = aLev[iSrc, iDst];
          int nIns = nCur - aLev[iSrc, iDst - 1];  // if move along B to find match, it was an insert
          int nDel = nCur - aLev[iSrc - 1, iDst];  // if move along A to find match, it was a deletion
          if (nIns == 1)                 // this char was NOT in A, but was inserted into B
            iDst--;                         //   Leave map of B[j] to nowher, scan to previous B (--j)
          else if (nDel == 1)            // this char was in A, but is missing in B
            iSrc--;                         //   Don't map any B, scan to previous A (--i)
          else                           // Match
            aMap[iDst-- - 1] = iSrc-- - 1;       //   After map B[j] to A[i], scan to prev A,B (--i, --j)
        } else {
          if (iDst > 0)       // remaining chars are inserts, Leave map of B[j] to nowher, scan to previous B (--j)
            iDst--;
          else if (iSrc > 0)  // Delete to the end, deletes do nothing
            iSrc--;
          else
            break;
        }
      }

      DumpMap(aMap, Dst); // Debug

      return aMap;

    }

    // just for debugging
    static void DumpLevMatrix(int[,] aLev, string[] Src, string[] Dst) {

      StringBuilder sb = new StringBuilder();
      int cSrc = Src.Length;
      int cDst = Dst.Length;
      int iSrc, iDst;
      sb.Length = 6;
      for (iDst = 0; iDst < cDst; ++iDst)
        sb.AppendFormat("{0,-3}", Dst[iDst]);
      Console.WriteLine(sb.ToString());
      for (iSrc = 0; iSrc <= cSrc; ++iSrc) {
        if (iSrc == 0)
          sb.Length = 3;
        else {
          sb.Length = 0;
          sb.AppendFormat("{0,-3}", Src[iSrc - 1]);
        }
        for (iDst = 0; iDst <= cDst; ++iDst)
          sb.AppendFormat("{0:00}", aLev[iSrc, iDst]).Append(" ");
        Console.WriteLine(sb.ToString());
      }

    }

    // just for debugging
    static void DumpMap(int?[] aMap, string[] Dst) {

      StringBuilder sb = new StringBuilder();
      for (int iMap = 0; iMap < aMap.Length; ++iMap)
        sb.AppendFormat("{0,-3}", Dst[iMap]);  // dst and map are same size
      Console.WriteLine(sb.ToString());
      sb.Length = 0;
      for (int iMap = 0; iMap < aMap.Length; ++iMap)
        if (aMap[iMap] == null)
          sb.Append("   ");
        else
          sb.AppendFormat("{0:00}", aMap[iMap]).Append(" ");
      Console.WriteLine(sb.ToString());

    }

  }
}
johnnycrash