views:

126

answers:

1

I have an array a[i][j]. The elements are char, interpreted as subsets of the set {1,...,8} (the element k is in the subset if the k-th bit is 1). I do not think it is relevant, but every element has exactly 4 bits set.

Every row a[1][j]..a[n][j] is a collection of subsets of {1,...,8}. I need to remove duplicate rows, where two rows are considered duplicates if one can be obtained from the other by a permutation of {1,...,8}.

Example (0bxxxxxxxx means binary number):

0b11000000, 0b01100000, 0b00110000

is a duplicate of

0b00110000, 0b00011000, 0b00100100

because the former can be obtained from the latter by applying the permutation

8->8, 7->7, 6->1, 5->4, 4->3, 3->2, 2->5, 1->6

and reordering the result.

For performance considerations, the array contains about 2000 rows, each comprising at most 20 elements. Each row is already ordered, and also the rows are in lexicographic increasing order, if this might help. The rest of the algorithm is written in C, so a C solution would be preferred.

Thanks for your help.

A: 

If all the subsets had 2 elements, this would represent graph isomorphism, with the subsets representing graph edges. This is even more general (thus probably harder), so I'd look at heuristics used to solve graph isomorphism and see whether they apply here.

There are a lot of graph-isomorphism heuristics that can cheaply exclude isomorphism. For a particular collection, you can compute how many subsets does each element belong to, then sort that. In your example, both collections would get [2,2,1,1,0,0,0,0]. If the sorted sequences for two collections are different, then there is no isomorphism. Of course equality doesn't guarantee that there is.

There are many more similar heuristics that are even better at sieving out non-isomorphic graphs (and could possibly apply here).

Also, 8! is only 40320, so brute-forcing all permutations is not totally unfeasible.

Rafał Dowgird
In graph teory, this is the isomorphism problem for 4-uniform hypergraphs.I managed to do some ad-hoc simplifications, that made the problem suitable for bruteforcing, but I am still wondering about good general algorithms.
Zadig