views:

162

answers:

2

The actual question goes like this:

McDonald's is planning to open a number of joints (say n) along a straight highway. These joints require warehouses to store their food. A warehouse can store food for any number of joints, but has to be located at one of the joints only. McD has a limited number of warehouses (say k) available, and wants to place them in such a way that the average distance of joints from their nearest warehouse is minimized.

Given an array (n elements) of coordinates of the joints and an integer 'k', return an array of 'k' elements giving the coordinates of the optimal positioning of warehouses.

Sorry, I don't have any examples available since I'm writing this down from memory. Anyway, one sample could be:

array={1,3,4,5,7,7,8,10,11} (n=9)
k=1

Ans: {7}

This is what I've been thinking: For k=1, we can simply find out the median of the set, which would give the optimal location of the warehouse. However, for k>1, the given set should be divided into 'k' subsets (disjoint, and of contiguous elements of the superset), and median for each subset would give the warehouse locations. However, I don't understand on what basis the 'k' subsets should be formed. Thanks in advance.

EDIT: There's a variation to this problem also: Instead of sum/avg, minimize the maximum distance between a joint and its closest warehouse. I don't get this either..

+1  A: 

This is NOT a clustering problem, it's a special case of a facility location problem. You can solve it using a general integer / linear programming package, but because the problem is on a line, there may be more efficient (and less expensive software-wise) algorithms that would work. You might consider dynamic programming since there are probably combination of facilities that could be eliminated rather quickly. Look into the P-Median problem for more info.

Grembo
I didn't get much from the p-median problem articles.. most of them have an extra parameter of 'transportation cost' which makes the problem more complex. Please help.
Arpit Tarang
I found out it was the 'facility location problem'. But still, they had complex algos for 2d problems... mine's only 1d. Help?
Arpit Tarang
Doesn't matter. You still have distances, they are just in one dimension.
Grembo
A: 

The straight highway makes this an exercise in dynamic programming, working from left to right along the highway. A partial solution can be described by the location of the rightmost warehouse and the number of warehouses placed. The cost of the partial solution will be the total distance to the nearest warehouse (for fixed k minimising this is the same as minimising the averge) or the maximum distance so far to the closest warehouse.

At each stage you have worked out the answers for the leftmost N joints and have them indexed by number of warehouses used and position of the rightmost warehouse - you need to save only the best cost. Now consider the next joint and work out the best solution for N+1 joints and all possible values of k and rightmost warehouse, using the answers you have stored for N joints to speed this up. Once you have worked out the best cost solution covering all the joints you know where its rightmost warehouse is, which gives you the location of one warehouse. Go back to the solution that has that warehouse as the rightmost joint and find out what solution that was based on. That gives you one more rightmost warehouse - and so you can work your way back to the location of all the warehouses for the best solution.

I tend to get the cost of working this out wrong, but with N joints and k warehouses to place you have N steps to take, each of the based on considering no more than Nk previous solutions, so I reckon cost is kN^2.

mcdowella
"Now consider the next joint and work out the best solution for N+1 joints and all possible values of k and rightmost warehouse, using the answers you have stored for N joints to speed this up." Could you please give (at least) a hint on how to do this?
Arpit Tarang
The basic idea is that the best answer for a given point can be described as something like "put a warehouse at this point and then use the answer for the point 4 to the left to say where to put the other warehouses" - but there are usually a few details to worry about that vary from problem to problem.If you are not familiar with dynamic programming look at e.g. the first two examples at http://mat.gsia.cmu.edu/classes/dynamic/dynamic.html - or search for other tutorials.
mcdowella