Here's a working solution in Haskell. I have called the unique substrings symbols, and an association of one occurrence of the substrings a placement. I have also interpreted criterion 3 ("Use as few different substrings as possible") as "use as few symbols as possible", as opposed to "use as few placements as possible".
This is a dynamic programming approach; the actual pruning occurs due to the memoization. Theoretically, a smart haskell implementation could do it for you, (but there are other ways where you wrap makeFindBest
), I'd suggest using a bitfield to represent the used symbols and just an integer to represent the remaining string. The optimisation is possible from the fact that: given optimal solutions for the strings S1 and S2 that both use the same set of symbols, if S1 and S2 are concatenated then the two solutions can be concatenated in a similar manner and the new solution will be optimal. Hence for each partition of the input string, makeFindBest
need only be evaluated once on the postfix for each possible set of symbols used in the prefix.
I've also integrated branch-and-bound pruning as suggested in Daniel's answer; this makes use of an evaluation function which becomes worse the more characters skipped. The cost is monotonic in the number of characters processed, so that if we have found a set of placements that wasted only alpha characters, then we never again try to skip more than alpha characters.
Where n is the string length and m is the number of symbols, the worst case is O(m^n) naively, and m is O(2^n). Note that removing constraint 3 would make things much quicker: the memoization would only need to be parameterized by the remaining string which is an O(n) cache, as opposed to O(n * 2^m)!
Using a string search/matching algorithm such as Aho-Corasick's string matching algorithm, improves the consume
/drop 1
pattern I use here from exponential to quadratic. However, this by itself doesn't avoid the factorial growth in the combinations of the matches, which is where the dynamic programming helps.
Also note that your 4th "etc." criteria could possibly change the problem a lot if it constrains the problem in a way that makes it possible to do more aggressive pruning, or requires backtracking!
module Main where
import List
import Maybe
import System.Environment
type Symbol = String
type Placement = String
-- (remaining, placement or Nothing to skip one character)
type Move = (String, Maybe Placement)
-- (score, usedsymbols, placements)
type Solution = (Int, [Symbol], [Placement])
-- invoke like ./a.out STRING SPACE-SEPARATED-SYMBOLS ...
-- e.g. ./a.out "abcdeafghia" "a bc fg"
-- output is a list of placements
main = do
argv <- System.Environment.getArgs
let str = head argv
symbols = concat (map words (tail argv))
(putStr . show) $ findBest str symbols
putStr "\n"
getscore :: Solution -> Int
getscore (sc,_,_) = sc
-- | consume STR SYM consumes SYM from the start of STR. returns (s, SYM)
-- where s is the rest of STR, after the consumed occurrence, or Nothing if
-- SYM isnt a prefix of STR.
consume :: String -> Symbol -> Maybe Move
consume str sym = if sym `isPrefixOf` str
then (Just (drop (length sym) str, (Just sym)))
else Nothing
-- | addToSoln SYMBOLS P SOL incrementally updates SOL with the new SCORE and
-- placement P
addToSoln :: [Symbol] -> Maybe Placement -> Solution -> Solution
addToSoln symbols Nothing (sc, used, ps) = (sc - (length symbols) - 1, used, ps)
addToSoln symbols (Just p) (sc, used, ps) =
if p `elem` symbols
then (sc - 1, used `union` [p], p : ps)
else (sc, used, p : ps)
reduce :: [Symbol] -> Solution -> Solution -> [Move] -> Solution
reduce _ _ cutoff [] = cutoff
reduce symbols parent cutoff ((s,p):moves) =
let sol = makeFindBest symbols (addToSoln symbols p parent) cutoff s
best = if (getscore sol) > (getscore cutoff)
then sol
else cutoff
in reduce symbols parent best moves
-- | makeFindBest SYMBOLS PARENT CUTOFF STR searches for the best placements
-- that can be made on STR from SYMBOLS, that are strictly better than CUTOFF,
-- and prepends those placements to PARENTs third element.
makeFindBest :: [Symbol] -> Solution -> Solution -> String -> Solution
makeFindBest _ cutoff _ "" = cutoff
makeFindBest symbols parent cutoff str =
-- should be memoized by (snd parent) (i.e. the used symbols) and str
let moves = if (getscore parent) > (getscore cutoff)
then (mapMaybe (consume str) symbols) ++ [(drop 1 str, Nothing)]
else (mapMaybe (consume str) symbols)
in reduce symbols parent cutoff moves
-- a solution that makes no placements
worstScore str symbols = -(length str) * (1 + (length symbols))
findBest str symbols =
(\(_,_,ps) -> reverse ps)
(makeFindBest symbols (0, [], []) (worstScore str symbols, [], []) str)