+1  A: 

What you are asking about is in essence the k-knapsack problem. The wikipedia page I liked has a wealth of resources on solutions for it. The problem is however NP-Complete to solve in entirety, so what you may wish to do is look for a close to best solution through simulated annealing or other forms for search through the problem space.

The first thing to remember is that in constraint problems you will probably end up taking a good chunk of time to generate a solution. In your previous example, though five items and ten stores seems small it, in reality, generates a large problem space (on the order of 1e5 not including the added complication of the conditional pricing which further engrosses the problem).

The constraints on the problem are that you buy one of each item. The goal is minimal price. I think what you have is fairly good, though I'm not too sure about bullet points one and two.

  • each price "p xy" is defined in the domain (0, c) where c is the price of the item in this shop
  • only one price in a line should be non zero
  • I would consider amortizing the shipping over the cost of the items bought rather than adding it on as a total value when doing the calculations.

    Mimisbrunnr
    OK, now I recognize the knapsack problem in my optimization. But I still have no idea to solve it. And for the shipping cost condition I see no equivalent in the knapsack problem.
    tangens
    I'll edit my answer and try to give some suggestions. Thought the best thing to do would be some research on the problem it self. It has a fairly rich body with some very interesting papers.
    Mimisbrunnr
    Could you explain your last paragraph again? What would you do with the shipping cost?
    tangens
    +3  A: 

    I have implemented this problem in MiniZinc (a high level constraint programming language): shopping_basket.mzn. It is quite forward and maybe could be used as a model for your Java model.

    For the Choco model, have you tried different search strategies? A different strategy may be faster.

    By the way, one other Java constraint programming solver you may want to check out is JaCoP.

    hakank
    Thanks a lot! I'll definitely have a closer look at MiniZinc and especially at your program!
    tangens
    How would you model the fact, that not all shops sell all items? Simply by giving them a very high price like 999999999?
    tangens
    That is one way, probably the simplest. Another way is to code it as 0 (zero) and adjust the calculations accordingly, but this is probably more messy.I have now fixed an error in the model, and added a third shop which don't have item 1 (coded as 9999999).
    hakank
    Here is a slightly different version where the decision variable x (from which shop an item is bought) is represented as a single array with the domain 1..num_shops (instead of a 0..1 matrix). This simplifies parts of the model.http://www.hakank.org/minizinc/shopping_basket2.mzn
    hakank
    I tried your program with my real data (http://pastie.org/958454). But it seems it will never (not so soon) terminate.
    tangens
    I have now tested different search strategies and different solvers for your bigger example (thanks for that!), and also modified the first matrix based model so N/A is represented by 0 instead of 999999. Here is the new model: http://www.hakank.org/minizinc/shopping_basket3.mznThe Gecode/fz solver ( http://www.gecode.org/ ) solves this model with a total of 3525 in about 2 seconds. The result, the shops where to buy each item is shown in the matrix (shown in the model). The approach with a single array as decision variables seems to be not as good as the matrix approach for this problem.
    hakank
    There is something wrong with the new encoding of 0 for a shop that doesn't sell an item. Your solution buys most items (e.g. the first one) at shops that don't sell this item. Only a few items are sold by all shops.
    tangens
    Darn, that's correct. My small test for this encoding was (of course) wrong. Thanks for the report. OK, I continue to experiment.
    hakank
    In the meantime I try to install Visual Studio Express and have a look at Gecode...
    tangens
    Next try: http://www.hakank.org/minizinc/shopping_basket5.mznHere I continue with the single array approach, but with considerable improvements of the domains (which seems to be correct). Also, the N/A is coded as 99999 (not really necessary for Gecode/fz) and the search heuristics is "largest/indomain_max". Gecode/fz solves it in about 25 seconds with the following result:total = 42013x = [13, 20, 17, 18, 18, 13, 17, 17, 20, 13, 17, 17, 13, 13, 18, 17, 13, 13, 17, 8, 13, 36, 10, 17, 13, 13, 17, 20, 13]
    hakank
    Thank you very much, hakank! I think this is the solution I was looking for! You are really great! I will definitly dive into constraint programming and try to understand what you did and how you optimized it. Thanks.
    tangens
    Sorry about this spam. :) By changing indomain_max to indomain_split in the version 5 model, Gecode/fz now take 12 seconds (Linux, dual, 2Gb), and the slightly different answer:total = 42013; x = [13, 20, 17, 18, 18, 13, 17, 17, 20, 13, 17, 13, 13, 13, 18, 17, 13, 13, 17, 8, 13, 36, 10, 17, 13, 13, 17, 20, 13].Running the model with "solve satisfy" and adding the constraint "/\ total = 42013", these are the only two solutions. This check is done in about 1 second (for Gecode/fz). Well, I hope that this is correct...
    hakank
    tangens: Thanks for your kind words. Some comments about the changes: Since the first two models was more of proof-of-concept, I was sloppy with the domains of "total" and the "this_cost". Constraint solvers can be quite sensitive to very large domains (such as "var int total"), or rather, the do much better with small domains. Also the search heuristics can also make a huge difference. The problem is to combine all this, which is - unfortunately - not a science but an art (or in some cases: luck:-).
    hakank
    I have explained some more about this in my blog post "Optimizing shopping baskets: The development of a MiniZinc model": http://www.hakank.org/constraint_programming_blog/2010/05/optimizing_shopping_baskets_th_1.html
    hakank
    A: 

    I'm not too sure this is a k-knapsack problem. The question did mention the words "shopping baskets" but did not specify a capacity to any given shipment. If you specified a maximum shipment size, then the problem starts looking more like a knapsack problem.

    The problem is really just a basic network flow problem with transporation costs on the arcs and costs at the origin. Because you have a clear objective function - minimize transportation + product cost and because there is likely to be only one solution, CP may not be the best approach.

    Consider solving as linear programming problem:

    Min: Transportation + Product Cost

    ST: Total product shipped >= Demand (for each product)

    You might need to develop some piecewise linear equations for the transportation cost, but that shouldn't be a problem.

    Grembo