views:

127

answers:

6

I have an interesting programming puzzle for you:

You will be given 2 things:

1. A word containing a list of english words put together : 

    word = "iamtiredareyou"

2. Possible subsets:

    subsets = ['i', 'a', 'am', 'amt', 'm', 't', 'ti', 'tire', 'tired', 'i', 'ire', 'r', 're', 'red', 'redare', 'e', 'd', 'da', 'dar', 'dare', 'a', 'ar', 'are', 'r', 're', 'e', 'ey', 'y', 'yo', 'you', 'o', 'u']

Level 1 Challenge:

I need to programatically find the members in subsets which together in an order will make "iamtiredareyou" i.e. ['i', 'am', 'tired', 'are', 'you']

Level 2 Challenge

The original string may consist of some extra characters in sequence which are not present in the subset.

e.g. "iamtired12aareyou"

subset given is same as above, the solution should automatically include this subset in the right place in the result array.

i.e. ['i', 'am', 'tired', '12a', 'are', 'you']

How can I do this? Please provide code snippets in programming language of your choice.

+3  A: 

Generally, a recursive algorithm would do. Start with checking all subsets against start of a given word, if found — add (append) to found values and recurse with remaining part of the word and current found values. Or if it's an end of the string — print found values.

something like that:

all=[]
def frec(word, values=[]):
    gobal all
    if word == "":  # got result.
        all+=[values]
    for s in subsets:
        if word.startswith(s):
            frec(word[len(s):], values+[s])

frec(word)

note that there are lots of possible solutions since subsets include many one-character strings. You might want to find some shortest of results. (13146 solutions... use “all.sort(cmp=lambda x, y: cmp(len(x), len(y)))” to get shortest)

For a level2 — you need another loop if no subset matches that adds more and more symbols to next value (and recurses into that) until match is found.

all=[]
def frec(word, values=[]):
    global all
    if word == "":  # got result.
        all+=[values]
        return true
    match = False
    for s in subsets:
        if word.startswith(s):
            match = True
            frec(word[len(s):], values+[s])       
    if not match:                        
        return frec(word[1:], values+[word[0]])
frec(word)

This does not try to combine non-subset values into one string, though.

please provide code snippets. It will help others in analyzing your solution better and discuss things like optimization, execution time etc.
demos
Yes, I just wasn't sure if I am actually going to write some code or stop at giving general algorithm description :)
+2  A: 

i think you should do your own programming excercises....

WG3
+1  A: 

For the Level 1 challenge you could do it recursively. Probably not the most efficient solution, but the easiest:

word = "iamtiredareyou"
subsets = ['i', 'a', 'am', 'amt', 'm', 't', 'ti', 'tire', 'tired', 'i', 'ire', 'r', 're', 'red', 'redare', 'e', 'd', 'da', 'dar', 'dare', 'a', 'ar', 'are', 'r', 're', 'e', 'ey', 'y', 'yo', 'you', 'o', 'u']

def findsubset():
    global word

    for subset in subsets:
        if word.startswith(subset):
            setlist.append(subset)
            word = word[len(subset):]

            if word == "":
                print setlist
            else:
                findsubset()

            word = subset + word
            setlist.pop()

# Remove duplicate entries by making a set
subsets = set(subsets)
setlist = []
findsubset()

Your list of subsets has duplicates in it - e.g. 'a' appears twice - so my code makes it a set to remove the duplicates before searching for results.

Dave Webb
A: 

Isn't it just the same as finding the permutations, but with some conditions? Like you start the permutation algorithm (a recursive one) you check if the string you already have matches the first X characters of your to find word, if yes you continue the recursion until you find the whole word, otherwise you go back.

Level 2 is a bit silly if you ask me, because then you could actually write anything as the "word to be found", but basically it would be just like level1 with the exception that if you can't find a substring in your list you simply add it (letter by letter i.e. you have "love" and a list of ['l','e'] you match 'l' but you lack 'o' so you add it and check if any of your words in the list start with a 'v' and match your word to be found, they don't so you add 'v' to 'o' etc.).

And if you're bored you can implement a genetical algorithm, it's really fun but not really efficient.

Zenzen
A: 

Here is a recursive, inefficient Java solution:

private static void findSolutions(Set<String> fragments, String target, HashSet<String> solution, Collection<Set<String>> solutions) {
    if (target.isEmpty()) {
        solutions.add(solution);
        return;
    }

    for (String frag : fragments) {
        if (target.startsWith(frag)) {
            HashSet<String> solution2 = new HashSet<String>(solution);
            solution2.add(frag);
            findSolutions(fragments, target.substring(frag.length()), solution2, solutions);
        }
    }       
}

public static Collection<Set<String>> findSolutions(Set<String> fragments, String target) {
    HashSet<String> solution = new HashSet<String>();
    Collection<Set<String>> solutions = new ArrayList<Set<String>>();
    findSolutions(fragments, target, solution, solutions);
    return solutions;
}
Eyal Schneider
+1  A: 

Sorry about the lack of programming snippet, but I'd like to suggest dynamic programming. Attack level 1 and level 2 at the same time by giving each word a cost, and adding all the single characters not present as single character high cost words. The problem is then to find the way of splitting the sequence up into words that gives the least total cost.

Work from left to right along the sequence, at each point working out and saving the least cost solution up to and including the current point, and the length of the word that ends that solution. To work out the answer for the next point in the sequence, consider all of the known words that are suffixes of the sequence. For each such word, work out the best total cost by adding the cost of that word to the (already worked out) cost of the best solution ending just before that word starts. Note the smallest total cost and the length of the word that produces it.

Once you have the best cost for the entire sequence, use the length of the last word in that sequence to work out what the last word is, and then step back that number of characters to inspect the answer worked out at that point and get the word just preceding the last word, and so on.

mcdowella