views:

336

answers:

1

Hey,

I got an array (let's call it a1) of words (like "dog", "fish", "run", "programming" anything) that's really huge.

I can combine any of the words out of a1 with any other of the words (e.g. you could combine "dog" and "programming" into "dog-programming"), and then again and again, until the strings gets really big.

I also got an array (let's call it a2) of strings (such as "de", "s", "x?", "umh", again they could be pretty much anything). It is guaranteed that there are no strings in a2 that cannot be found in any string of a1.

What I'm looking for is the shortest string (shortest in terms of how many combinations it takes to create that string, not how many characters that string contains) that contains all of the strings inside a2. If there are more than one string that all feature that minimum length, I can just pick any, and bail out of the program.

Now, I don't think I can just brute-force this because even if the arrays are relative small, there are virtually endless options of combining the words, but I'd love to be proven wrong!

Is there any good way to get the shortest string that will surely yield the shortest result or do I have to use heuristical algorithms to just search for pretty short strings?

EDIT: I tried just picking a string from a1 that covered most strings from a2, then removing those items from a2, and starting again, but it doesn't work! It yields pretty good results, but not the best.

+3  A: 

Hi,

if you combine the words with a dash like in your example, e.g.

dog + programming + sky = dog-programming-sky

AND the words in A2 do not contain dashes, then it's a just SET-COVER in disguise, an NP-complete optimization problem. You can then solve the problem using any solution strategy available for SET-COVER. There is a fast approximation algorithm for SET-COVER, but if you want to have the absolute smallest solution, you'd need to resort to a worst-case exponential algorithm.

If you combine the words without a dash, e.g.

dog + programming + sky = dogprogrammingsky

then the problem is more difficult, because now e.g. "ogpro" is found in the combined word even if it isn't a substring of any of the constituent strings.

antti.huima
Thanks! In my case, it's like your first example. What approximation algorithms would you suggest for my problem?
Fred
+1 good answer blah blah now over 15 char requirement :)
Larry Watanabe
Hi, there is a canonical approximation algorithm for set cover that is also very simple to implement. It's covered on the "set cover" Wikipedia page! It's very simple but is actually the best! From Wikipedia: "Inapproximability results show that the greedy algorithm is essentially the best-possible polynomial time approximation algorithm for set cover (see Inapproximability results below), under plausible complexity assumptions." Check it out at http://en.wikipedia.org/wiki/Set_cover_problem
antti.huima