views:

268

answers:

5

I have 3 lists, I will make them simple here.

list of letters
A
B
C

list of numbers
1
2
3

Mixed
A,1
A,2
B,2
B,3
C,1
C,3

I need to know what is missing:
A,3
B,1
C,2

The list of letters has about 85 entries
and the list of numbers has about 500 entries.

The mixed list has about 75,000 entries.

I can use either a database query (mysql 5.0) or Turbo Delphi 2006 to process text files. What would be the best way to find what is missing?

Thanks,
Dave

+3  A: 

A cross join would produce every combination there is, given that you have both of your lists in SQL tables:

SELECT
  Letter + ',' + Number AS Combination
FROM
  NumberList,
  LetterList

Using the combined result (maybe save it into a temporary table), you can use a NOT EXISTS query to find what is missing:

SELECT
  Combination
FROM
  AllCombinations AS a
WHERE
  NOT EXISTS 
  (SELECT 1 FROM MyCombitations AS m WHERE m.Combination = a.Combination)

This would require a table MyCombitations, which lists all of the combinations you actually have and want to check against the full list.

If you want to speed things up, you should use a permanent table of combinations and an index on the MyCombitations.Combination field. For repeated querying this is definitely advisable.

Tomalak
+1  A: 

75.000 is not much. Load list of letters and list of numbers into two separate TStringLists. Create dynamic array (indices would be indexes into those two string lists) with the appropriate dimensions. Fill it up. Load the data and mark each line in the array. Output all elements that were not marked.

In pseudocode (untested):

var
  i1, i2: integer;
  sl1, sl2: TStringList;
  cross: array of array of boolean;
begin
  // load data into sl1, sl2
  SetLength(cross, sl1.Count, sl2.Count);
  for i1 := 0 to sl1.Count - 1 do
    for i2 := 0 to sl2.Count - 1 do
      cross[i1, i2] := false;
  // for each element in 'combined' list
    // split it into elements s1, s2
    i1 := sl1.IndexOf(s1);
    i2 := sl2.IndexOf(s2);
    if (i1 < 0) or (i2 < 0) then
      // report error
    else
      cross[i1, i2] := true;
  for i1 := 0 to sl1.Count - 1 do
    for i2 := 0 to sl2.Count - 1 do
      if not cross[i1, i2] then
        // output sl1[i1], sl2[i2]
end;
gabr
A: 
SELECT letter,number FROM lettersTable l , numbersTable n WHERE
(
 SELECT count(*) 
 FROM 
  (
   SELECT * 
   FROM combinationsTable 
   WHERE l.letter=combinationsTable.letter AND n.number = combinationsTable .number
  ) AS temp
) = 0;

This relies on SELECT * FROM A,B testing all combinations (implicit cross-join).

Yann Semet
+1  A: 

If you can sort the data (Turbo powers SysTools has a good sort routine which works well) then you can do this in code fairly quickly with two input lists and an output list. The concept behind this is simple:

  1. Take two lists, sorted in the same manner
  2. If the left side is less than the right side, then the right side is missing that value so add it to your "missing list" and increment the cursor for the left side.
  3. If they are equal, then increment both,
  4. If the right side is less than the left side, then only increment the right side (optionally add to the "must delete" list).

This process is sometimes refered to as the "Old Master/New Master" process and is extremely fast as your only walking both lists once.

Simple example:

var
  ListL : tStringList; // the left list
  ListR : tSTringList; // the right list
  ListA : tSTringList; // the Add List (should start empty)
  ListD : tStringList; // the Delete list (should start empty)
  iCurL : integer;     // Left Cursor
  iCurR : integer;     // Right Cursor
  iRes  : integer;     // result of compare
begin
  iCurL := 0;
  iCurR := 0;
  ListL := tStringList.create;
  ListR := tSTringList.create;
  ListA := tSTringList.create;
  ListD := tStringList.create;
  InitAndLoadLists(ListL,ListR,ListA,ListD);
  while (iCurL <= ListL.Count-1) and (iCurR <= ListR.Count-1) do
    begin
      iRes := CompareStr(ListL.Strings[iCurL],ListR.Strings[iCurR]);
      if iRes = 0 then
        begin
          inc(iCurL);
          inc(iCurR);
        end;
      if iRes < 0 then
        begin
          ListA.Add(ListL.Strings[iCurL]);
          inc(iCurL);
        end;
      if iRes > 0 then
        begin
          listD.Add(ListR.Strings[iCurR]);
          inc(iCurR);
        end;
    end;
  while (iCurL <= ListL.Count-1) do
    begin
      listA.Add(ListL.Strings[iCurL]);
      inc(iCurL);
    end;
  while (iCurR <= ListR.Count-1) do
    begin
      listD.Add(ListR.Strings[iCurR]);
      inc(iCurR);
    end;
  ShowMessage( 'ADDS' + ^M+^J + ListA.Text);
  ShowMessage( 'DELS' + ^M+^J + ListD.Text);
end;

The following code is what I used for testing. This is just an example, but in a real world situation I would build my keys so that they would sort properly, right padding numbers, and forcing case where appropriate. If your using the turbo power sort routines, you can use TWO sorts instead of two lists, but the end effect is the same.

procedure InitAndLoadLists(ListL, ListR, ListA, ListD: TStringList);
begin
  ListL.Add('A,1');
  ListL.Add('B,3');
  ListL.Add('C,2');
  ListR.Add('A,2');
  ListR.Add('B,3');
  ListR.Add('C,4');
end;

Edit: Code tested in Delphi 2009 and behaves properly.

skamradt
+2  A: 

There's no need to create extra tables. The following query would work just as well:

SELECT c.chr, n.num
FROM chars c, nums n
 WHERE NOT EXISTS (SELECT 1
                     FROM mix m
                    WHERE m.chr = c.chr AND m.num = n.num)
This along with LIMIT 100 is exactly what I was looking for.Thank you all!
Dave Albert