views:

797

answers:

4

What's the best way to describe the algorithmic complexity of collusion detection for a ten-million-player online poker site?

Assume (I don't think these assumptions make much difference so feel free to ignore them, but just to clarify):

  • That the site has 10,000,000 registered users.
  • That these players have played a total of 5 billion hands.
  • That the only information you're given is the "master hand history database" for the site, containing all player hole cards and betting actions for each hand.
  • In other words, you may NOT take shortcuts such as examining IP addresses, looking for unusual rake/profit patterns, and so forth.
  • Assume you are given a function which, when passed a group of exactly N (where N is between 2 and 10) players, returns TRUE if ALL of the players in the group have colluded TOGETHER. If some but not all of the players are colluders, the function returns FALSE. A return value of TRUE is made with (for example) 75% confidence.

Your job is to produce an exhaustive list of every player who's colluded, along with a complete list of the players he's colluded with. I have recently heard this problem described as NP-hard but is this accurate? Sometimes we call things "NP" or "NP-hard" that are merely "hard".

Thanks!

+5  A: 

The brute-force approach I see immediately is:

Set colluders = new Set();
for(Player p1 : allPlayers)
{
  for(Player p2 : allPlayers)
  {
    if(!p1.equals(p2) && haveColluded(p1, p2))
    {
      colluders.add(p1);
      colluders.add(p2);
    }
  }
}

I don't see a point to calling haveColluded with larger argument counts than 2 because that could give false negatives. I suppose though it depends how costly the function is. But the above results in O(n^2) calls to haveColluded (n being number of players). The function itself would seemingly be O(m), where m is the number of games they played together. Thus, the algorithm seems well under O(n^3). To be NP-hard, you have to prove "A problem H is NP-hard if and only if there is an NP-complete problem L that is polynomial time Turing-reducible to H [...] In other words, L can be solved in polynomial time by an oracle machine with an oracle for H." (http://en.wikipedia.org/wiki/NP-hard). I have studied NP-complete problems (e.g. 3-SAT, Travelling salesman problem, etc.) and I don't see how you'd prove that. But then again, it does seem suspiciously similar to the clique problem.

Matthew Flaschen
Thanks for the informative answer. I also don't see how you'd "prove" that it's NP-hard, but it bears a suspicious resemblance to problems which are NP-hard. Of course, having the "haveColluded" function simplifies things. IRL the problem is (if you ask me) intractable except in cases of obvious collusion (ie, where 6 players log in from the same IP or something like that).
James D
This depends on the properties of the `haveColluded()` function. Perhaps 10 players colluding together can only be detected by calling the function on all 10 of them. If this is the case, the problem is much harder.
Rafał Dowgird
+4  A: 

Looks like clique detection, which is NP-hard. On the other hand, the clique size is limited here (10), so brute-force is n^10 at worst.

Edit:The key question here is what the properties of the collusion function are. Can 10 players colluding together always be detected by calling the function on two smaller sets (say 5) players?

Rafał Dowgird
I don't believe this is the 'clique detection problem'. He isn't even being asked to detect cliques of a given size. He is being asked whether or not a graph of up to 10 nodes is fully connected. This is a fairly trivial problem.
ceretullis
It's definitely the clique problem, as I see it, and instead of "knowing x" his decision is "colluded with x".
Noon Silk
@ceretullis: No. He is being asked for a complete list of nodes (in a huge graph) that are members of a subgraph that has a property determined by the `haveColluded()` function. This is completely different and much harder than checking a single graph of size 10 for cliqueness.
Rafał Dowgird
ceretullis
ceretullis
+2  A: 

Under your model, what you describe should be fairly easy. You are given an implicit graph (vertices are players, edges correspond to having played a game together). You want a subgraph of that graph.

If the collusion function were perfectly reliable you just call it on every pair of vertices in the graph, and you get the subgraph.

That subgraph is probably fairly disconnected. I would expect the resulting graph to be disconnected or very weakly connected; large well connected subgraphs will fall out quickly by doing some min-cuts.

Note that we can restrict ourselves to looking at only pairs, because the collusion function should obey (in terms of confidence level) Collude(A,B,C)<Collude(A,B).

Constructing this global collusion function is the part that seems hard.

Captain Segfault
+1  A: 

I would split this into two steps:

  1. Iterate over 5 billion hands of poker examining the play in each hand. Employ some algorithm, let's call it algorithm A, on each hand. As you go you build a collusion graph where vertices represent players and undirected weighted edges represent some confidence of collusion between two players. When algorithm A triggers on suspicion of player X colluding with player Y, some value is added to the weighted edge XY in the collusion graph. As you progress through the hands that have been played the edge weights accumulate over time. When some threshold has been reached, then the edge represents collusion between X and Y.

  2. Then the function determining whether a list of N player vertices have all colluded together is a matter of verifying the subgraph containing the N vertices is fully-connected (meaning every node has an edge weight greater than the collusion threshold to every other node in the subgraph). IIRC, determining this is O(n*lg(n) ).

ceretullis