views:

357

answers:

8

It's really all in the title, but here's a breakdown for anyone who is interested in Evolutionary Algorithms:

In an EA, the basic premise is that you randomly generate a certain number of organisms (which are really just sets of parameters), run them against a problem, and then let the top performers survive.

You then repopulate with a combination of crossbreeds of the survivors, mutations of the survivors, and also a certain number of new random organisms.

Do that several thousand times, and efficient organisms arise.

Some people also do things like introduce multiple "islands" of organisms, which are seperate populations that are allowed to crossbreed once in awhile.

So, my question is: what are the optimal repopulation percentages?

I have been keeping the top 10% performers, and repopulating with 30% crossbreeds and 30% mutations. The remaining 30% is for new organisms.

I have also tried out the multiple island theory, and I'm interested in your results on that as well.

It is not lost on me that this is exactly the type of problem an EA could solve. Are you aware of anyone trying that?

Thanks in advance!

A: 

You know what you could do... You could write a genetic algorithm to determine that optimal distribution.

But, usually I keep the top 12%, and 28% crossbreeds; with 30% each for the others.

TraumaPony
+5  A: 

The best resources I've come across for GA and EA were John Koza's books on Genetic Programming. He covers the topic in depth - techniques for encoding the genome, random mutation, breeding, tuning the fitness function.

Personally I've only written a small handful of simulators for pedagogical purposes. What I found was that how I tuned those percentages was related to the particulars of the fitness function I was using, how much random mutation I had introduced and how 'smart' I had tried to make the mutation and breeding - I found that the less 'smart' I tried to make the mutator and the cross-over logic, the faster the population improved its fitness score - I also found that I had been too conservative in the probability of mutation -- my initial runs hit local maxima and had a hard time getting out of them.

None of this gives you concrete answers, but I don't think there are concrete answers, GA is unpredictable by its nature and tuning those kinds of parameters may still be a bit of an art. Of course you could always try a meta-GA, using those parameters as a chromosome, searching for settings that produce a more rapid fitness in the base GA you're running.

Depends on how 'meta' you want to get.

Kyle Burton
+4  A: 

This is a hotly-debated (in the literature and Melanie, et al books) topic that seems to be very domain-specific. What works for one problem of one type with n parameters will almost never work for another problem, another domain, or another parametric set.

So, as TraumaPony suggested, tweak it yourself for each problem you are solving or write something to optimize it for you. The best thing you can do is keep track of all of your "knob-twiddling" and fine-tuning experiments so you can map out the solution terrain and get a feel for how to optimize within that space quickly. Also try alternative techniques like hill-climbing so you can have a baseline to beat.

@Kyle Burton: crossover vs. mutation rates are also constantly debated in each class of problems handed over to GAs and GPs.

Sean
+2  A: 

Assuming you have a method for quantifying the top X% percent performers, I'd suggest that instead of using a hard coded threshold you analyze the performance distribution and make your cutoff somewhere in the range of the first major drop in performance, and then tuning your crossbreads, mutations, and new organisms to fill in the gaps. This way if you have a very "productive" run in which lots of variations were successful you don't throw a significant number of high performers. Also, if you have an "unproductive" run you can scrap more of the existing organisms in favor of more newer organisms that should be taking their place.

Mark Roddy
Interesting - so you could wiggle your thresholds a bit. It is true that sometimes a bad run causes a sort of ice age effect, murdering lots of really smart organisms. I suppose you could argue that this is part of the process though, eh? Sort of like how the roaches would survive nuclear winter.
Brian MacKay
I actually work in modeling and simulation and not in genetic programming, but that's the approach we take when there's an end state we want a model to get to but we don't know the starting state that will get there.
Mark Roddy
"Mutate" the model (generate statistical variations in the parameters), run it and then examine what end states are closest to the desired state, "Mutate" the starting state of these and run again along with more variations of the original model. Let me know if you want more details.
Mark Roddy
+3  A: 

I initially tried to model what I thought organic systems were like. Ultimately decided that was no good, and went more aggressive, with 10% kept, 20% mutated, 60% crossbred, and 10% random.

Then I noticed my top 10% were all roughly identical. So I increased the random to 30%. That helped some, but not much.

I did try multiple island, and generation-skipping, and reseeding, which gave better results, but still highly unsatisfactory, very little variation in the top 10%, crazy-long numbers of generations to get any results. Mostly the code learned how to hack my fitness evaluation.

It's really easy to get top performers, so don't worry about keeping too many of them around. Crossbreeds help to pare down positive and negative traits, so they're useful, but really what you want to get is a lot of good random bred in. Focus on mutations and new randoms to bring in features, and let the crossbreeds and top performers just keep track of the best and refine them more slowly. IE: stuff based on the last generation is just finding a better local maxima, randoms find better global maxima.

I still believe optimal answers to your question can be found by observing natural phenomena, such as in a recent article regarding randomness of fruit-fly flight paths, so that may pan out.

Probably the best answer is to just run it and tweak it, don't be afraid to tweak it pretty heavily, the populations are robust. Make sure you implement a way to save and continue.

davenpcj
+1  A: 

There would appear to be a few answers suggesting using a 2nd GA to determine optimum parameters for the 1st GA, with no mention of how to determine the optimum parameters for the 2nd. I can't help but wonder about the religious beliefs of those suggesting this approach...

JoeBloggs
The deafening silence could be a clue? :)
jTresidder
Well, it's obvious isn't it... That's what the first GA is trying to do ;)
Andrew Rollings
Not much to choose between "chicken and egg" and "turtles all the way down" though... chicken and egg requires less memory I guess :)
JoeBloggs
+2  A: 

I have had some success increasing the diversity of population by setting mutation and crossover from a couple of the genes from the parent chromosomes.

This works until the mutation rate drops to zero; since it is likely that there will be a periodic evolutionary pressure to do this, you should try and make sure these genes have a minimum rate.

In practice, I opted for a multi-chromosome genotype. One chromosome coded for the other's reproductive function. The smaller 'reproduction chromosome' had a sensible fixed rates for mutation and crossover.

I found that this would stop the classic plateau and convergence of the population.

As an aside, I tend to do both crossover and mutation for each child.

For generational GAs, I try to shun elitism altogether, but where populating from multiple islands, I keep the top elite from each island. When the islands come together, then the elites can all breed together.

jamesh
That's a VERY thoughtful answer. Thanks a lot!
Brian MacKay