views:

219

answers:

2

Question
Do you think genetic algorithms worth trying out for the problem below, or will I hit local-minima issues?

I think maybe aspects of the problem is great for a generator / fitness-function style setup. (If you've botched a similar project I would love hear from you, and not do something similar)

Thank you for any tips on how to structure things and nail this right.

The problem
I'm searching a good scheduling algorithm to use for the following real-world problem.

I have a sequence with 15 slots like this (The digits may vary from 0 to 20) :

2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

(And there are in total 10 different sequences of this type)

Each sequence needs to expand into an array, where each slot can take 1 position.

1 1 0 0 1 1 1 0 0 0 1 1 1 0 0 
1 1 0 0 1 1 1 0 0 0 1 1 1 0 0 
0 0 1 1 0 0 0 1 1 1 0 0 0 1 1
0 0 1 1 0 0 0 1 1 1 0 0 0 1 1 

The constraints on the matrix is that:

  • [row-wise, i.e. horizontally] The number of ones placed, must either be 11 or 111
  • [row-wise] The distance between two sequences of 1 needs to be a minimum of 00
  • The sum of each column should match the original array.
  • The number of rows in the matrix should be optimized.

The array then needs to allocate one of 4 different matrixes, which may have different number of rows:

A, B, C, D

A, B, C and D are real-world departments. The load needs to be placed reasonably fair during the course of a 10-day period, not to interfere with other department goals.

Each of the matrix is compared with expansion of 10 different original sequences so you have:

A1, A2, A3, A4, A5, A6, A7, A8, A9, A10
B1, B2, B3, B4, B5, B6, B7, B8, B9, B10
C1, C2, C3, C4, C5, C6, C7, C8, C9, C10
D1, D2, D3, D4, D5, D6, D7, D8, D9, D10

Certain spots on these may be reserved (Not sure if I should make it just reserved/not reserved or function-based). The reserved spots might be meetings and other events

The sum of each row (for instance all the A's) should be approximately the same within 2%. i.e. sum(A1 through A10) should be approximately the same as (B1 through B10) etc.

The number of rows can vary, so you have for instance:

A1: 5 rows A2: 5 rows A3: 1 row, where that single row could for instance be:

0 0 1 1 1 0 0 0 0 0 0 0 0 0 0

etc..

Sub problem*

I'de be very happy to solve only part of the problem. For instance being able to input:

1 1 2 3 4 2 2 3 4 2 2 3 3 2 3

And get an appropriate array of sequences with 1's and 0's minimized on the number of rows following th constraints above.

+1  A: 

Hey.
GA can be applied to this problem, but it won't be 5 minute task. You need to put several things together, without knowing which implementation of each of them is best.
So:

  1. Solution representation - how you will represent possible solution? Using matrix seems to be most straight forward. Using collection of one dimensional arrays is possible also. But you have some constrains, so maybe SuperGene concept is worth considering?
  2. You must use proper mutation/crossover operators for given gene representation.
  3. How will you enforce constrains on solutions? Destroying those that are not proper? What if they contain valuable information? Maybe let them stay in population but add some penalty to fitness, so they will contribute to offspring, but won't go into next generations?

Anyway I think that GA can be applied to this problem. Is it worth? Usually GA are not best algorithm, but they are decent algorithm if others fail. I would go with GA, just because it would be most fun but I would look for alternative solution (just in case).

P.S. Personal insight: I was solving N Queens Problem, for 70 < N < 100 (board NxN, N queens). Algorithm was working fine for lower N (maybe it was trying all combination?), but with N in this range, I couldn't find proper solution. Fitness quickly jumped to about 90% of max, but in the end there were always two queens conflicting. But it was very naive implementation.

yoosiba
(I agree fully than anythin AI-like is way cooler than just implementing something that just works).I'm unsure how to interpret the SuperGene concept. Is that a special way to look at constraints, rather than enforcing certain constraints in generation, mutations and crossovers?
tovare
It is like that. In SuperGene you put constrains on individual with complex representation. Like in your case constrain on sum for rows, etc. All other constrains (if any, something like constrains on domain) you can put in your fitness function. Also in some cases using SuperGene improves overall performance of algorithm as per http://jgap.sourceforge.net/doc/supergenes/supergenes.html By the way JGAP is nice package with lots of futures, and quite easy to use.
yoosiba
Thanks, i'll investigate that in the next few days.
tovare
+2  A: 

Sub-problem solution attempt

Well, here's an idea. This solution is not based on using a genetic algorithm, but some ideas could be used in going in that direction.

Basis vectors

First of all, you should generate what I think of as the basis vectors. For instance, if your sequence were 3 numbers long rather than 15, the basis vectors would be:

v1 = [1 1 0]

v2 = [0 1 1]

v3 = [1 1 1]

Any solution for sequence length 3 would be a linear combination of these three vectors using only positive integers. In other words, the general solution would be

a*v1 + b*v2 + c*v3

where a, b and c are positive integers. For the sequence [1 2 1], the solution is v1 = 1, v2 = 1, v3 = 0. What you first want to do is find all of the possible basis vectors of length 15. From my rough calculations I think that there are somewhere between 300-400 basis vectors of length 15. I can give you some tips towards generating them if you want.

Finding solutions

Now, what you want to do is sort these basis vectors by their sums/magnitudes. Then in searching for your solution, you start with the basis vectors which have the largest sums. We start with the vectors that have the largest sums because they lead to having less total rows. We also have an array, veccoefs, which contains an entry for the linear coefficient for each basis vector. At the beginning of searching for the solution, all the veccoefs are 0.

So we take the first basis vector (the one with the largest sum/magnitude) and subtract this vector from the sequence until we either create an unsolvable result ( having a 0 1 0 in it for instance) or any of the numbers in the result is negative. We store the number of times we subtract the vector in veccoefs. We use the result after subtracting the basis vector from the sequence as the sequence for the next basis vector. If there are only zeros left in the result, then we stop the loop.

I'm not sure of the efficiency/accuracy of this method, but it might at least give you some ideas.

Other possible solutions

Another idea for solving this is to use the basis vectors and form the problem as an optimization/least squares problem. You form a matrix of the basis vectors such that the basic problem will be minimizing Sum[(Ax - b)^2] where A is the matrix of basis vectors, b is the input sequence, and x are the basis vector coefficients. However, you also want to minimize the number of rows, so you can add a term like x^T*x to the minimization function where x^T is the transpose of x. The hard part in my opinion is finding differentiable terms to add that will encourage integer vector coefficients. If you can think of a way to do that, then optimization could very well be a good way to do this.

Also, you might consider a Metropolis-type Monte Carlo solution. You would choose randomly whether to add a vector, remove a vector, or substitute a vector at each step. The vector to be added/removed/substituted would be chosen randomly. The probability of this change to be accepted would be a ratio of the suitabilities of the solutions before the change and after the change. The suitability could be equal to the difference between the current solution and the sequence, squared and summed, minus the number of rows/basis vectors involved in the solution. You would need to put in appropriate constants to for various terms to try to get the acceptance rate around 50%. I kind of doubt that this will work very well, but I thought that you should still consider it when looking for possible solutions.

Justin Peel