views:

107

answers:

2

Given two sorted vectors a and b, find all vectors which are sums of a and some permutation of b, and which are unique once sorted.

You can create one of the sought vectors in the following way:

  • Take vector a and a permutation of vector b.
  • Sum them together so c[i]=a[i]+b[i].
  • Sort c.

I'm interested in finding the set of b-permutations that yield the entire set of unique c vectors.

Example 0: a='ccdd' and b='xxyy'
Gives the summed vectors: 'cycydxdx', 'cxcxdydy', 'cxcydxdy'.
Notice that the permutations of b: 'xyxy' and 'yxyx' are equal, because in both cases the "box c" and the "box d" both get exactly one 'x' and one 'y'.

I guess this is similar to putting M balls in M boxes (one in each) with some groups of balls and boxes being identical.
Update: Given a string a='aabbbcdddd' and b='xxyyzzttqq' your problem will be 10 balls in 4 boxes. There are 4 distinct boxes of size 2, 3, 1 and 4. The balls are pair wise indistinguishable.

Example 1: Given strings are a='xyy' and b='kkd'.
Possible solution: 'kkd', 'dkk'.
Reason: We see that all unique permutations of b are 'kkd', 'kdk' and 'dkk'. However with our restraints, the two first permutations are considered equal as the indices on which the differ maps to the same char 'y' in string a.

Example 2: Given strings are a='xyy' and b='khd'.
Possible solution: 'khd', 'dkh', 'hkd'.

Example 3: Given strings are a='xxxx' and b='khhd'.
Possible solution: 'khhd'.

I can solve the problem of generating unique candidate b permutations using Narayana Pandita's algorithm as decribed on Wikipedia/Permutation.
The second part seams harder. My best shot is to join the two strings pairwise to a list, sort it and use it as a key in a lookup set. ('xx'+'hd' join→'xh','xd' sort→'xd','xh').

As my M is often very big, and as similarities in the strings are common, I currently generate way more b permutations than actually goes through the set filter. I would love to have an algorithm generating the correct ones directly. Any improvement is welcome.

+1  A: 

To generate k-combinations of possibly repeated elements (multiset), the following could be useful: A Gray Code for Combinations of a Multiset (1995).

For a recursive solution you try the following:

Count the number of times each character appears. Say they are x1 x2 ... xm, corresponding to m distinct characters.

Then you need to find all possible ordered pairs (y1 y2 ... ym) such that

0 <= yi <= xi

and Sum yi = k.

Here yi is the number of times character i appears.

The idea is, fix the number of times char 1 appears (y1). Then recursively generate all combinations of k-y1 from the remaining.

psuedocode:

List Generate (int [] x /* array index starting at 1*/, 
               int k /* size of set */) {

    list = List.Empty;

    if (Sum(x) < k) return list;

    for (int i = 0; i <= x[1], i++) {

        // Remove first element and generate subsets of size k-i.

        remaining = x.Remove(1);

        list_i = Generate(remaining, k-i);

        if (list_i.NotEmpty()) {

            list = list + list_i;    

        } else {

            return list;
        }

    }

    return list;
}

PRIOR TO EDITS:

If I understood it correctly, you need to look at string a, see the symbols that appear exactly once. Say there are k such symbols. Then you need to generate all possible permutations of b, which contain k elements and map to those symbols at the corresponding positions. The rest you can ignore/fill in as you see fit.

I remember posting C# code for that here: http://stackoverflow.com/questions/2350211/how-to-find-permutation-of-k-in-a-given-length/2350399#2350399

I am assuming xxyy will give only 1 unique string and the ones that appear exactly once are the 'distinguishing' points.

Eg in case of a=xyy, b=add

distinguishing point is x

So you select permuations of 'add' of length 1. Those gives you a and d.

Thus add and dad (or dda) are the ones you need.

For a=xyyz b=good

distinguishing points are x and z

So you generate permutations of b of length 2 giving

go
og
oo
od
do
gd
dg

giving you 7 unique permutations.

Does that help? Is my understanding correct?

Moron
No sorry,Is my new explaination better?
Thomas Ahle
+1 because I understood the same thing ...
belisarius
No, the solution doesn't work. That seams clear for a case where there are no unique symbols in a?
Thomas Ahle
@THomas: For no unique symbols, there is exactly one generated by this method. Anyway, this was before the edits you made. Perhaps you should give longer/more examples (with what output you expect) to make it clearer.
Moron
If you see example 0, you'll notice that `a='xxyy'`, `b='ccdd'` gives three unique solutions.
Thomas Ahle
I think I've found a way to solve the problem, but it requires an algorithm that yields all length-k combinations of a string, ignoring duplicates: `combs('aabc',2)=['aa','ab','bc','ca']`. Do you know one such?
Thomas Ahle
@Thomas: I have added a link to my answer, which can probably(not sure, as I haven't read it completely) be used to give an implementation which uses very little memory (something like next_combination). Recursive solutions should be possible, but I expect they will use a lot of memory.
Moron
A: 

Ok, I'm sorry I never was able to clearly explain the problem, but here is a solution.

We need two functions combinations and runvector(v). combinations(s,k) generates the unique combinations of a multiset of a length k. For s='xxyy' these would be ['xx','xy','yy']. runvector(v) transforms a multiset represented as a sorted vector into a more simple structure, the runvector. runvector('cddeee')=[1,2,3].

To solve the problem, we will use recursive generators. We run through all the combinations that fits in box1 and the recourse on the rest of the boxes, banning the values we already chose. To accomplish the banning, combinations will maintain a bitarray across of calls.

In python the approach looks like this:

def fillrest(banned,out,rv,b,i):
    if i == len(rv):
        yield None
        return
    for comb in combinations(b,rv[i],banned):
        out[i] = comb
        for rest in fillrest(banned,out,rv,b,i+1):
            yield None

def balls(a,b):
    rv = runvector(a)
    banned = [False for _ in b]
    out = [None for _ in rv]
    for _ in fill(out,rv,0,b,banned):
        yield out[:]

>>> print list(balls('abbccc','xyyzzz'))
[['x', 'yy', 'zzz'],
 ['x', 'yz', 'yzz'],
 ['x', 'zz', 'yyz'],
 ['y', 'xy', 'zzz'],
 ['y', 'xz', 'yzz'],
 ['y', 'yz', 'xzz'],
 ['y', 'zz', 'xyz'],
 ['z', 'xy', 'yzz'],
 ['z', 'xz', 'yyz'],
 ['z', 'yy', 'xzz'],
 ['z', 'yz', 'xyz'],
 ['z', 'zz', 'xyy']]

The output are in 'box' format, but can easily be merged back to simple strings: 'xyyzzzz', 'xyzyzz'...

Thomas Ahle
If you're done with the question you should accept your own answer.
Greg Kuperberg