views:

202

answers:

5

I have N number of sets Si of Numbers, each of a different size. Let m1, m2, ... mn be the sizes of respective sets (mi = | Si |), and M be the size of the largest set. I have to find common subsets that have at least two numbers in them. Example:

Set  Items
1    10,80,22
2    72, 10, 80, 26,50
3    80,
4    10, 22
5    22, 72, 10, 80, 26,50

So the result will be like that

Items                Found in sets
10, 22               1, 4
10, 80               1, 2, 5
10, 80, 22           1, 5
10, 72, 80, 26, 50   2, 5

So how to automate this problem and what is the expected complexity for respective solution? I need it to be as fast as possible.

A: 

I have some tips. You should be sort all items in sets, when you loop in loop for counting a number in that set. You should be add condition for check a number is greater than your search value and use break; for end of loop.

cloverink
+2  A: 

You could do this -

  1. Create a hash table & insert as keys each of your item & values as the sets they belong to. Eg: 10:[0,1,3,4] 80:[0,1,2,4] 22:[0,3,4] 72:[1,4] 26:[1,4] 50:[1,4]

  2. After this for each 2 elements in the hash table, find the intersection of the values of the keys. Here intersection of (10, 80) gives [0,3,4]. Do this for (10,22)(10,72)... You have your result.

  3. Complexity - To insert M items into the Hash Table - O(M) Search each key in the hash table - O(1) Intersection operation of key's value's - O(m^2) [This could be optimized]

On the whole I would say this is O(m^2) algo. But if the size of "Items" is not large in each "Set", then you wouldn't notice much performance hit.

MovieYoda
+1  A: 

One solution I can think of:

Use a hash table, generate all the combination of "a pair of numbers" of the numbers in that "row", which is O(M * M) time.

Then use those as the hash keys, mapping to the main array index.

For each row of those N elements,
  do the step above... and if the hash already maps to a number, then it is a match,
    then return the pair, or else just put those in the hash

it is O(N * M * M)

update: if, as you say in the comment, that M can be maximum 15, then O(N * M * M) is really just the same as O(N).


If your initial question is to find all pairs, then

For each row of those N elements,
  do the step first mentioned... and if the hash already maps to a number, 
    then it is a match and just print out the pair if this pair is not printed yet
  and in any event, put the mapping into the hash

to tell whether a pair has been printed, created another hash, such as
  printed_or_not[i,j] = true or false, 
    and make sure to check printed_or_not[i,j] or printed_or_not[j, i]
    because not need to print again if just different order
動靜能量
+1  A: 

You're being asked to do the pairwise intersection of all your set and then collect all results that are size >= 2.

There are O(N^2) pairs. Doing the intersection is O(M) for each. Assemble all the results, sort them by set contents to work out the duplicates is N^2 Log N^2 (worst case is that the intersection is different for every pair so there can be O(N^2) result sets)

So I think the complexity is O((N^2 + N log N) * M)

But there's quite possible an error somewhere.

Paul

Paul
+2  A: 

Since the original question asks to make a discussion as fast as possible, here's how it can be made short.

First, discuss whether the output is exponential to your input. Assume you have 2 elements, and N sets. Assume that each element belongs to each set; it will require N lines as your input. Then, how many lines should you print?

I think that you're going to print 2N-N-1 lines, like these:

1,2     1,2
1,2     1,3
.....
1,2     1,N
1,2     2,1
.....
1,2     1,2,3
.....
1,2     1,2,3,...N

Since you're going to spend O(2N) time printing, you could pick any of the suggestions on this page to calculate this information—you've been caught by an exponent anyway.

This argument would make your discussion very fast.

Pavel Shved
nice, not to mention that all that I/O will have likely higher constant factor than the calculation steps anyway
jk
That's quite humorous, but I'm pretty certain Ali wanted the _algorithm_ to be fast, not the _discussion._ :-)
paxdiablo
@pax, well, I'm not sure how Ali wanted to make an algorithm with an exponential lower bound fast. So I assumed that it's about the discussion.
Pavel Shved
Thx Pavel Shved. I know this solution but the system will go in coma (it will sleep for years). Any how thx for your algo
Ali