ansaurus

Question

Counting combinations of pairs of items from multiple lists without repetition

Answer 1

+1 A:

While the problem looks quite simple it could be related to the NP-complete Set Cover Problem. So it could be possible that there is no efficent way to detect valid combinations, hence no efficent way to count them.

UPDATE

I thought about the list items beeing pairs because it seems to make the problem harder to attack - you have to check two properties for one item. So I looked for a way to reduce the pair to a scalar item and found a way.

Map the set of the n symbols to the set of the first n primes - I will call this function M. In the case of the symbols 0 to 9 we obtain the following mapping and M(4) = 11 for example.

{0, 1, 2, 3, 4, 5, 6, 7, 8, 9} => {2, 3, 5, 7, 11, 13, 17, 19, 23, 29}

Now we can map a pair (n, m) using the mapping X to the product of the mappings of n and m. This will turn the pair (2, 5) into X((2, 5)) = M(2) * M(5) = 5 * 13 = 65.

X((n, m)) = M(n) * M(m)

Why all this? If we have two pairs (a, b) and (c, d) from two lists, map them using the mapping X to x and y and multiply them, we obtain the product x * y = M(a) * M(b) * M(c) * M(d) - a product of four primes. We can extend the product by more factors by selecting a pair from each list and obtain a product of 2w primes if we have w lists. The final question is what does this product tell us about the pairs we selected and multiplied? If the selected pairs form a valid selection, we never choose one symbol twice, hence the product contains no prime twice and is square free. If the selection is invalid the product contains at least one prime twice and is not square free. And here a final example.

X((2, 5)) = 5 * 13 = 65
X((3, 6)) = 7 * 17 = 119
X((3, 4)) = 7 * 11 = 77

Selecting 25 and 36 yields 65 * 119 = 7735 = 5 * 7 * 13 * 17 and is square free, hence valid. Selecting 36 and 34 yields 119 * 77 = 9163 = 7² * 11 * 17 and is not square free, hence not valid.

Also note how nicely this preserves the symmetrie - X((m, n)) = X((n, m)) - and prohibites symmetric pairs because X((m, m)) = M(m) * M(m) is not square free.

I don't know if this will be any help, but now you know it and can think about it...^^

This is the first part of an reduction of a 3-SAT problem to this problem. The 3-SET problem is the following.

(!A | B | C) & (B | !C | !D) & (A | !B)

And here is the reduction as far as I got.

m-n represents a pair
a line reprresents a list
an asterisk represents an abitrary unique symbol

A1-A1'    !A1-!A1'     => Select A true or false
B1-B1'    !B1-!B1'     => Select B true or false
C1-C1'    !C1-!C1'     => Select C true or false
D1-D1'    !D1-!D1'     => Select D true or false

A1-*   !B1-*   !C1-*   => !A | B | C

A2-!A1'   !A2-A1'      => Create a copy of A
B2-!B1'   !B2-B1'      => Create a copy of B
C2-!C1'   !C2-C1'      => Create a copy of C
D2-!D1'   !D2-D1'      => Create a copy of D

!B2-*  C2-*    D2-*    => B | !C | !D

(How to perform a second copy of the four variables???)

!A3-*  B3-*

If I (or somebody else) can complete this reduction and show how to do it in the general case, this will proof the problem NP-complete. I am just stuck with copying the variables a second time.

Daniel Brückner 2009-06-11 20:30:03

Wow, thanks for the update. I'll spend some time experimenting with this idea :)

snazzer 2009-06-11 22:31:14

Answer 2

A:

First try.. Here is an algorithm with an improved reduced average complexity than brute force. Essentially you create strings with increasing lengths in each iteration. This may not be the best solution but we will wait for the best one to come by... :)

Start with list 1. All entries in that list are valid solutions of length 2 (#=5) Next, when you introduce list 2. keep a record of all valid solutions of length 4, which end up being {1425, 2314, 2315, 2415} (#=4).

When you add the third list to the mix, repeat the process. You will end up with {142536, 241536} (#=2).

The complexity reduction comes in place because you are throwing away bad strings in each iteration. The worst case scenario happens to be still the same -- in the case that all pairs are distinct.

Ryan Oberoi 2009-06-11 20:31:13

This solution could use an enormous amount of memory - worst case with an alphabet of n symbols: n/2 lists with each list containing all n² possible pairs from 00 to nn. This will create all n! permutations of the n symbols consuming O(n * n!) memory. In the case of [0-9]: 5 lists with 100 symbols each will generate 3,628,800 string of length 10.

Daniel Brückner 2009-06-11 20:42:02

Daniel, We can have a mapping from a permutation to a single number that represents that mapping which would slightly improve the complexity. I agree the worst case scenario is pretty bad/ugly, though in practical usage, this solution may work out for snazzer. Lets wait for a better solution. I am sure its out there :)

Ryan Oberoi 2009-06-11 20:54:40

Another thought. If they are going to do anything *besides* counting, then the algorithm uses the minimum space required for them to process the strings. and saves them from 10000000000 calculations that they would have to do in your example if they followed brute force.

Ryan Oberoi 2009-06-11 21:01:26

Answer 3

A:

This feels like a good problem to which to apply a constraint programming approach. To the list of packages provided by Wikipedia I'll add that I've had good experience using Gecode in the past; the Gecode examples also provide a basic tutorial to constraint programming. Constraint Processing is a good book on the subject if you want to dig deeper into how the algorithms work.

Richard Dunlap 2009-06-11 20:47:35

Answer 4

+2 A:

Lets says that every element in the lists is a node in a graph. There is an edge between two nodes if they can be selected together (they have no common symbol). There is no edge between two nodes of the same list. If we have n lists the problem is to find the number of cliques of size n in this graph. There is no clique which is bigger than n elements. Given that finding out whether at least one such clique exists is np-complete I think this problem is np-complete. See: http://en.wikipedia.org/wiki/Clique_problem

As pointed out we have to prove that solving this problem can solve the Clique problem to show that this is NP-complete. If we can count the number of required sets ie the number of n size cliques then we know whether there is at least one clique with size n. Unfortunatelly if there is no clique of size n then we don't know whether there are cliques with size k < n.

Another question is whether we can represent any graph in this problem. I guess yes but I am not sure about it.

I still feel this is NP-Complete

Gábor Hargitai 2009-06-11 20:53:17

The two lists L1 = {12,56} and L2 = {34,78} contain at least one clique of size 4, namely 12 -> 34 -> 56 -> 78 going from L1 to L2, back to L1 and finally to L2 again. If there would be four lists this would be a 4-clique that is no valid solution, hence counting the 4-cliques does not solve the problem. (While I think the problem is hard, too) even if counting n-cliques would solve the problem, this would not prove that the problem is NP-complete. You have to prove the other direction - solving this problem can solve a NP-complete problem.

Daniel Brückner 2009-06-11 21:11:46

As there is no edge between 12 and 56 or 34 and 78 those four don't form a clique. What you described is only a path. And you are right with the direction of proving.

Gábor Hargitai 2009-06-11 21:22:37

Ohhh yes, you are right. It's indeed counting the number of n-cliques. The remaining question is if you can reduce n-clique to this problem in polynomial time - this would proof it to be NP-complete.

Daniel Brückner 2009-06-11 21:36:03

Answer 5

+3 A:

The problem is #P-complete. This is even HARDER than NP-complete. It is as hard as finding the number of satisfying assignments to an instance of SAT.

The reduction is from Perfect matching. Suppose you have the graph G = {V, E} where E, the set of edges, is a list of pairs of vertices (those pairs that are connected by an edge). Then encode an instance of "pairs of items" by having |V|/2 copies of E. In other words, have a number of copies of E equal to half of the number of vertices. Now, a "hit" in your case would correspond to |V|/2 edges with no repeated vertices, implying that all |V| vertices were covered. This is the definition of a perfect matching. And every perfect matching would be a hit -- it's a 1-1 correspondence.

Martin Hock 2009-06-12 15:59:31

That's the answer ... +1 ... and - now that I know there is one - I can try to complete my 3-SAT reduction ...

Daniel Brückner 2009-06-14 14:22:19

Answer 6

+1 A:

I am going to say there is no calculation that you can do other than brute force becuse there is a function that has to be evaluated to decide whether an item from set B can be used given the item chosen in set A. Simple combinatorial math wont work.

You can speed up the calculation by 1 to 2 magnitudes using memoization and hashing.

Memoization is remembering previous results of similar brute force paths. If you are at list n and you have already consumed symbols x,y,z and previously you have encountered this situation, then you will be adding in the same number of possible combinations from the remaining lists. It does not matter how you got to list n using x,y,z. So, use a cached result if there is one, or continue the calc to the next list and check there. If you make a brute force recursive algorithm to calculate the result, but cache results, this works great.

The key to the saved result is: the current list, and the symbols that have been used. Sort the symbols to make your key. I think a dictionary or an array of dictionaries makes sense here.

Use hashing to reduce the number of pairs that need to be searched in each list. For each list, make a hash of the pairs that would be available given that a certain number of symbols are already consumed. Choose the number of consumed symbols you want to use in your hash based on how much memory you want to use and the time you want to spend pre-calculating. I think using 1-2 symbols would be good. Sort these hashes by the number of items in them...ascending, and then keep the top n. I say throw out the rest, becasue if the hash only reduces your work a small amount, its probably not worth keeping (it will take longer to find the hash if there are more of them). So as you are going through the lists, you can do a quick scan the list's hash to see if you have used a symbol in the hash. If you have, then use the first hash that comes up to scan the list. The first hash would contain the fewest pairs to scan. If you are really handy, you might be able to build these hashes as you go and not waste time up front to do it.

You might be able to toss the hash and use a tree, but my guess is that filling the tree will take a long time.

johnnycrash 2009-06-12 20:12:47

Answer 7

A:

Constraint programming is a nice approach if you want to generate all the combinations. Just to try it out, I wrote a model using Gecode (version 3.2.2) to solve your problem. The two examples given are very easy to solve, but other instances might be harder. It should be better than generate and test in any case.

/*
 *  Main authors:
 *     Mikael Zayenz Lagerkvist <[email protected]>
 *
 *  Copyright:
 *     Mikael Zayenz Lagerkvist, 2009
 *
 *  Permission is hereby granted, free of charge, to any person obtaining
 *  a copy of this software and associated documentation files (the
 *  "Software"), to deal in the Software without restriction, including
 *  without limitation the rights to use, copy, modify, merge, publish,
 *  distribute, sublicense, and/or sell copies of the Software, and to
 *  permit persons to whom the Software is furnished to do so, subject to
 *  the following conditions:
 *
 *  The above copyright notice and this permission notice shall be
 *  included in all copies or substantial portions of the Software.
 *
 *  THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
 *  EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
 *  MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
 *  NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
 *  LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
 *  OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
 *  WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
 *
 */
#include <gecode/driver.hh>
#include <gecode/int.hh>
#include <gecode/minimodel.hh>

using namespace Gecode;

namespace {
  /// List of specifications
  extern const int* specs[];
  /// Number of specifications
  extern const unsigned int n_specs;
}


/**
 * \brief Selecting pairs
 *
 * Given a set of lists of pairs of values, select a pair from each
 * list so that no value is selected more than once.
 *
 */
class SelectPairs : public Script {
protected:
  /// Specification
  const int* spec;
  /// The values from all selected pairs
  IntVarArray s;
public:
  /// The actual problem
  SelectPairs(const SizeOptions& opt)
    : spec(specs[opt.size()]),
      s(*this,spec[0] * 2,Int::Limits::min, Int::Limits::max) {

    int pos = 1; // Position read from spec
    // For all lists
    for (int i = 0; i < spec[0]; ++i) {
      int npairs = spec[pos++];
      // Construct TupleSet for pairs from list i
      TupleSet ts;
      for (int p = 0; p < npairs; ++p) {
        IntArgs tuple(2);
        tuple[0] = spec[pos++];
        tuple[1] = spec[pos++];
        ts.add(tuple);
      }
      ts.finalize();

      // <s[2i],s[2i+1]> must be from list i
      IntVarArgs pair(2);
      pair[0] = s[2*i]; pair[1] = s[2*i + 1];
      extensional(*this, pair, ts);
    }

    // All values must be pairwise distinct
    distinct(*this, s, opt.icl());

    // Select values for the variables
    branch(*this, s, INT_VAR_SIZE_MIN, INT_VAL_MIN);
  }

  /// Constructor for cloning \a s
  SelectPairs(bool share, SelectPairs& sp) 
    : Script(share,sp), spec(sp.spec) {
    s.update(*this, share, sp.s);
  }

  /// Perform copying during cloning
  virtual Space*
  copy(bool share) {
    return new SelectPairs(share,*this);
  }

  /// Print solution
  virtual void
  print(std::ostream& os) const {
    os << "\t";
    for (int i = 0; i < spec[0]; ++i) {
      os << "(" << s[2*i] << "," << s[2*i+1] << ") ";
      if ((i+1) % 10 == 0)
        os << std::endl << "\t";
    }
    if (spec[0] % 10)
      os << std::endl;
  }
};

/** \brief Main-function
 *  \relates SelectPairs
 */
int
main(int argc, char* argv[]) {
  SizeOptions opt("SelectPairs");
  opt.iterations(500);
  opt.size(0);
  opt.parse(argc,argv);
  if (opt.size() >= n_specs) {
    std::cerr << "Error: size must be between 0 and "
              << n_specs-1 << std::endl;
    return 1;
  }
  Script::run<SelectPairs,DFS,SizeOptions>(opt);
  return 0;
}

namespace {
  const int s0[] = {
    // Number of lists
    3,
    // Lists (number of pairs, pair0, pair1, ...)
    5,  1,2,  1,3,  1,4,  2,3,  2,4,
    3,  1,4,  1,5,  2,5,
    5,  1,6,  1,7,  2,5,  2,6,  3,6
  };

  const int s1[] = {
    // Number of lists
    3,
    // Lists (number of pairs, pair0, pair1, ...)
    5,  1,2,  2,3,  3,4,  4,5,  6,7,
    5,  1,2,  2,3,  3,4,  4,5,  6,7,
    5,  1,2,  2,3,  3,4,  4,5,  6,7
  };

  const int *specs[] = {s0, s1};
  const unsigned n_specs = sizeof(specs)/sizeof(int*);
}

Zayenz 2009-12-16 13:23:39

Thank you; I'll check it out after re-reading some of the Gecode documentation (it's been awhile :D).

snazzer 2009-12-17 21:42:46

One thing that you forgot to mention in the question is the typical size of instances. For any solution where all selections are enumerated the number of solutions is a trivial lower bound on the run-time. As for the above solution, it is of course possible to do a lot more clever tricks in a custom program that just counts the number of solutions without generating them.

Zayenz 2009-12-18 13:19:17

ansaurus

tags:

views:

answers:

Counting combinations of pairs of items from multiple lists without repetition

related questions