views:

393

answers:

3

There are 2 input lists L and M, for example:

L =  ['a',     'ab',  'bba']
M = ['baa', 'aa',  'bb']

How to obtain 2 non-empty output lists U and V such that: ''.join(U) == ''.join(V)) is True, and every element of U is in L, and every element of V is in M?

For example, one possible solution for the two input lists above is:

U=['bba', 'ab', 'bba', 'a']
V=['bb', 'aa', 'bb', 'baa']

because

 'bbaabbbaa' == 'bbaabbbaa' is True

and every element of ['bba', 'ab', 'bba', 'a'] is in ['a', 'ab', 'bba']

and every element of ['bb', 'aa', 'bb', 'baa'] is in ['baa', 'aa', 'bb']

1) Create an algorithm which will find at least one solution (U and V).

2) Can it be solved in O(n) where n=len(L+M) ?

:wq

+9  A: 

What are you looking for -- all (the countable infinity of) possible solutions? The "shortest" (by some measure) non-empty solution, or the set of equal-shortest ones, or...?

Because, if any solution will do, setting U and V both to [] meets all the stated conditions, and is O(1) to boot;-).

Edit: ok, so, joking apart, here's a nicely symmetrical solution to print the first ten of the countably-infinite non-empty solutions:

import itertools as it
import collections

L = ['a', 'ab', 'bba']
M = ['baa', 'aa', 'bb']

def cmbs(L=L, M=M):
  Ucans = collections.defaultdict(list)
  Vcans = collections.defaultdict(list)
  sides = (L, Vcans, Ucans), (M, Ucans, Vcans)
  for i in it.count(1):
    for k, (G, Ocans, Tcans) in enumerate(sides):
      for u in it.product(G, repeat=i):
        j = ''.join(u)
        if j in Ocans:
          for samp in Ocans[j]:
            result = samp, u
            yield result[1-k], result[k]
        Tcans[j].append(u)

if __name__ == '__main__':
  for x, y in it.islice(cmbs(), 10):
    print x, y, ''.join(x), ''.join(y)

which emits

('a', 'a') ('aa',) aa aa
('bba', 'a') ('bb', 'aa') bbaa bbaa
('a', 'a', 'a', 'a') ('aa', 'aa') aaaa aaaa
('a', 'a', 'bba', 'a') ('aa', 'bb', 'aa') aabbaa aabbaa
('a', 'ab', 'a', 'a') ('aa', 'baa') aabaa aabaa
('a', 'ab', 'bba', 'a') ('aa', 'bb', 'baa') aabbbaa aabbbaa
('bba', 'a', 'a', 'a') ('bb', 'aa', 'aa') bbaaaa bbaaaa
('bba', 'ab', 'a', 'a') ('bb', 'aa', 'baa') bbaabaa bbaabaa
('bba', 'ab', 'bba', 'a') ('bb', 'aa', 'bb', 'baa') bbaabbbaa bbaabbbaa
('bba', 'a', 'bba', 'a') ('bb', 'aa', 'bb', 'aa') bbaabbaa bbaabbaa

I'm not sure what's meant by O(N) in the context of a problem with countably infinite solutions -- what's N supposed to be here?!-)

Edit 2: changed to use (default)dicts of lists to ensure it finds all solutions even when the same joined-string can be made in > 1 ways from one of the input collections (a condition which did not occur in the sample input, so the sample output is unaffected); for example, if L is ['a', 'aa'] clearly any joined string with > 1 a can be made in multiple ways -- the current solution will emit all of those multiple ways when such a joined string matches one made for M, while the previous one just emitted one of them.

Alex Martelli
I have forgot to mention that U and V must be non-empty
psihodelia
@Alex Martelli, Great!, just one little question, what's the mean of the line:"if u not in Tcans: Tcans[j] = u"? I think the if statement will always return True, since u is a tuple such as ('a',) and j is a string such as 'a'.
sunqiang
I would think N is the number of elements L + the number of elements in M. Since you only need to find one solution (not all of them) per the assignment, the number of possible solutions does not have to affect the runtime.
sepp2k
N in O(N) is supposed to be N=len(L+M)
psihodelia
it is appropriate to ask about O(N) because the question is to find at least one solution
psihodelia
@sunqiang, you're right, it's supposed to be `if j not in Tcans:` -- and it's kind of redundant, in that one way or another it only records one way to make up the joinedstring j (better to record a list of ways to make up each j -- I'll edit the answer accordingly). Thanks!
Alex Martelli
@psihodelia, given that definition of N (i.e., one that's independent of the lengths of the items of L and M), an O(N) solution is clearly not possible, because the lengths of those items obviously must also matter (as the items' lengths grow from a few bytes, to gigabytes, to petabytes, etc, etc, clearly the solution must keep getting slower and slower even for a constant N); so the answer to your question 2 must be **no**. (or are there more specs yet that you didn't express? just getting all the specs is starting to sound like it takes more time than a normal job interview!-).
Alex Martelli
@Alex_Martelli: your solution is very fast in comparison with a solution presented by Juanjo Conti below. In terms of time-complexity your algorithm in order to produce some-number of strings requires polinomial time; another solution (by Juano) requires exponential time (and more space as well) to produce the same number of output strings. So, your algorithm is more effective. You are the winner :)
psihodelia
+4  A: 

This is my try! I think it finds all the solutions.

from itertools import product 
m = 5   # top limit of elements in output lists
sumsets = lambda s1, s2: s1 | s2

for u in reduce(sumsets, [set(product(L, repeat=i)) for i in  range(1, m+1)]):
    for v in reduce(sumsets, [set(product(M, repeat=i)) for i in  range(1, m+1)]):
            if ''.join(u) == ''.join(v):
                print u, v

Output: U, V

('a', 'a', 'a', 'a') ('aa', 'aa')
('a', 'a') ('aa',)
('a', 'a', 'bba', 'a') ('aa', 'bb', 'aa')
('bba', 'a', 'a', 'a') ('bb', 'aa', 'aa')
('bba', 'a') ('bb', 'aa')
('bba', 'ab', 'a', 'a') ('bb', 'aa', 'baa')
('a', 'ab', 'a', 'a') ('aa', 'baa')
('a', 'ab', 'bba', 'a') ('aa', 'bb', 'baa')
('bba', 'ab', 'bba', 'a') ('bb', 'aa', 'bb', 'baa')
('bba', 'a', 'bba', 'a') ('bb', 'aa', 'bb', 'aa')
Juanjo Conti
If it stops (without specific provisions for the purpose such as my use of `islice`), you know it can't have found all the (countably infinite) solutions;-).
Alex Martelli
Your solution is extremely slow!
psihodelia
+1  A: 

This is known as the Post Correspondence Problem and it's undecidable as others have said.

job
I believe the poster *intended* to state the Post Correspondence Problem, but @psihodelia didn't restrict the indices to match for the LHS and RHS, so it's actually a different problem.
Jeffrey Harris