views:

105

answers:

1

hi,

I came across a specific problem and looking for some algorithm for it. The problem to solve is as described below.

Let's say we have combinations like below

1 - 3 - 5

1 - 4 - 5

1 - 8 - 5

2 - 4 - 5

3 - 4 - 5

2 - 4 - 7

These combinations were generated from given sets, in this particular case let's say from

{1},{3,4,8},{5}

{2,3},{4},{5}

{2},{4},{7}

What I want to do is recreate sets from these combinations. I know for these combinations you have more than one solution, e.g.

1st solution

{1}, {3, 4, 8}, {5}

{2, 3}, {4}, {5}

{2}, {4}, {7}

2nd solution

{1}, {3, 8}, {5}

{1, 2, 3}, {4}, {5}

{2}, {4}, {7}

3rd solution

{1}, {3, 4, 8}, {5}

{3}, {4}, {5}

{2}, {4}, {5, 7}

But the final (optimal) solution would be the one with as little sets as possible or the random one in case they are all equivalent in terms of sets count.

Do algorithms for such a problem exist? I appreciate if anybody who has been dealing with this kind of problem can give me some hints.

EDIT: looks like what I'm looking for is a decomposition of n-ary product (Cartesian product for N)

EDIT: after more research on the topic I found out that the problem is known in a 'graph theory' as the 'minimum clique cover' problem

regards, baz

+1  A: 

This is not really an answer, more an extended comment. Your 'compressed representations' do not, in fact save any space at all.

In the uncompressed representation you store:

  • the rule that says that each group is made up of 3 symbols; and
  • 18 (in your example) symbols.

This can be stored in 1R+18S (where R is the space required to store a rule, S the space required to store a symbol)

In your supposed-to-be compressed representations you have to store:

  • the rule that says that each group is made up of 3 sets of symbols;
  • the symbols in each set; and
  • another symbol which delimits each set from the next.

In your first 'compressed' representation I count 1R+12S+8D (where D is the space required for storing one delimiter). If S==D then this is 1R+20S -- more than your uncompressed representation.

In your other two 'compressed' representations I count the same: 1R+12S+8D, and 1R+12S+8D.

I haven't figured out whether this non-compression is an essential feature of your proposal, or an accidental feature of the example you have chosen.

Do you mean, when you write that

the count of elements in combination is actually N

that some combinations will have 3 elements, others 4, others 2 or 5 etc ?

I suggest that you clarify your question.

EDIT: @bazeusz: now it seems that you are looking for the cartesian product of the sets.

High Performance Mark
hey Mark, although saving a space is not a purpose here I don't agree with you.The compressed representation will reduce space needed to store it.Think about it as number arrays not strings,delimiters doesn't matter.It's more a representation in terms of a structure kept in memory.I'm getting data in an uncompressed form (internally as arrays of numbers) and I need to transform it to a compressed form The process would be reverse generating combinations from multiple sets.By N I'm thinking a solution should be generic and work for any N. But group only arrays with the same N.
bazeusz
@bazeusz: if saving space is not a consideration I suggest you modify the title of your question and remove the word 'compress'. And if saving space is not your objective, what is your question ?
High Performance Mark
@Mark: hey, it's looking for sets from cartesian product of these sets; do you know any existing algorithm?
bazeusz
@bazeusz: yes: foreach element in set 1, foreach element in set 2, foreach element in set3 (etc), write the sequence of elements. That's all there is to it.
High Performance Mark
@Mark: the process you described is you have some sets and you want to create cartesian product from it; and this one is obvious; but I'm interested in a situation when I have the results from such an operation and I want to recreate sets from which the results were created. So this is reversed operation to what you described.
bazeusz