views:

81

answers:

2

In addition to original balls and baskets problem I mentioned here : http://stackoverflow.com/questions/1948311/balls-and-baskets-problem-algorithm

There is a slightly different problem.

Still there are N people and they have unlimited balls but they dont have baskets this time.

Problem is :

There are N people with unlimited balls and M different baskets. People throw balls to baskets.

I want to find the groups of people who are throwing balls to the same baskets.

Person A throws to Baskets 1 , 2, 4, ,6,7, 14, 51, 32 Person B throws to Baskets 3, 4, 6, 7, 14,15, 16, 64,43 Person C throws to Baskets 3, 4, 6,7,5, 87, 42, 32, 52, 55 . . . etc.

In this example person A and B may be well connected (lets say friends) (4,6,7,14 common) and C may be connected to them too but not so well connected. (4, 6,7 common)

I want to find groups of 4-5 people like that in a very large database of people.

A: 

Ok, my initial idea; model this as a weighted graph. Make the people the vertices and create a link when they share a basket. If they share multiple baskets then increase the weight of the link for each basket they share. So in your example the weight of the edge between A and B would be 4.

You can then decide how friendly everyone in a group has to be, prune the edges that have a weight less than that value and then look for cliques in the resultant graph.

Jackson
thanks a lot jackson but here is one problem I see in your solution :after eleminating the edges, fiinding cliques, in the cliques we'll have pairs that have same number of common baskets but these common baskets may be different than other pairs in the clique.For example :1 output clique is A B C DThreshold was 3A B have 1 2 3 4 in commonB C have 5 6 7 8 in commonC D have 9 10 11 12 in commonA D have 13 14 15 16 in commonA C have 17 18 19 20 in commonB D have 21 22 23 24 in commonso that clique may be produced but actually it does not look like a clique. Am I clear about it?
huhuhuuu
I see the distinction, will have to think about this some more.
Jackson
A: 

Jackson's idea is the beginning.

After you found the cliques, define the common basket set for each clique, by the intersection of all the egdes' baskets. (where an edge's baskets are the baskets common to the people represented by this edge's nodes). only if the common basket set is larger than X, then this clique represents a really connected group.

for example, in the original question, the edges would be:

A-B:  weight=4, baskets=[4,6,7,14]
A-C:  weight=3, baskets=[4,6,7]
B-C:  weight=4, baskets=[3,4,6,7]

if you prune at less than 3, you get that this is a clique, with common basket set = [4,6,7].

in the example you gave in the comment to Jackson's answer, the common basket set would be empty, so you know these guys are not really connected.

Ofri Raviv