views:

286

answers:

4

I'm trying to solve a problem that looks like a standard Linear Programming problem with one twist.

We have as input a set of "phrases" each of which has a weight. We need to choose how many times to repeat each phrase in a text to maximize the total weight, subject to a max character length limitation.

This seems like a straightforward linear programming problem, except for the fact that one phrase could be subphrase of another. So for example if your input phrases include "foo bar" and "foo", then if you repeat the phrase "foo bar", you also repeat the phrase "foo". I don't know how to deal with this in a linear programming model.

A: 

Just recalculate the weights. For example:

key = 3
board = 2
keyboard = 5

This changes to:

key = 3
board = 2
keyboard = 5 + 3 + 2 = 10

If you do this you have to watch out for phrases which include a phrase that includes a phrase. You have to make sure you don't do this:

ey = 7
key = 3 + 7
board = 2
keyboard = 5 + 7 + (3 + 7) + 2

You can avoid it by ordering the list after the lengths of the phrases. (Longest first)

BTW: Isn't this a dynamic programming problem?

Georg
+3  A: 

You have another problem, a concatenation of two phrases might form a third phrase. For example, if you have

list...4
sting...6
ingrave...7
abcd...5

then a solution "listingrave" has "gain" 17, whereas anything you could do with "abcd" ("abcdingrave") has only 12, although for length 4, "abcd" is the optimum.

It is an interesting problem, though, I think you need to build an automaton that searches text for your words and you need to find a path in that automaton of given length which passes through the "candiest" states (eg. through the states which give the most candy in return). I will consider that further.

UPDATE: It should work like that:

  1. Build a DFA from the Aho-Corasick text searching algorithm
  2. Search for a walk (not path, ie. you allow repeated edges and vertices) of given length with the maximum gain, starting from the starting state of the DFA. You can do it with dynamic programming (build longer walks from 1 character shorter ones).
jpalecek
Having only few phrases one could create all permutations… This problem seems to be harder than I thought.
Georg
A: 

Maybe it would work by using only dynamic programming and recalculating the whole thing each time. One could use a whole lot of optimizations to speed it up. (Like trying to recalculate only the last part) I think the speed would be reasonable. O(n^2 * m) or so.

Georg
A: 

This looks like it can be solved as a knapsack problem, see http://en.wikipedia.org/wiki/Knapsack_problem, with weights defined as suggested by gs.

Fanfan