views:

527

answers:

5

A friend of mine is beginning to build a NetHack bot (a bot that plays the Roguelike game: NetHack). There is a very good working bot for the similar game Angband, but it works partially because of the ease in going back to the town and always being able to scum low levels to gain items.

In NetHack, the problem is much more difficult, because the game rewards ballsy experimentation and is built basically as 1,000 edge cases.

Recently I suggested using some kind of naive bayesian analysis, in very much the same way spam is created.

Basically the bot would at first build a corpus, by trying every possible action with every item or creature it finds and storing that information with, for instance, how close to a death, injury of negative effect it was. Over time it seems like you could generate a reasonably playable model.

Can anyone point us in the right direction of what a good start would be? Am I barking up the wrong tree or misunderstanding the idea of bayesian analysis?

Edit: My friend put up a github repo of his NetHack patch that allows python bindings. It's still in a pretty primitive state but if anyone's interested...

+2  A: 

In nethack unknown actions usually have a boolean effect -- either you gain or you loose. Bayesian networks base around "fuzzy logic" values -- an action may give a gain with a given probability. Hence, you don't need a bayesian network, just a list of "discovered effects" and wether they are good or bad.

No need to eat the Cockatrice again, is there?

All in all it depends how much "knowledge" you want to give the bot as starters. Do you want him to learn everything "the hard way", or will you feed him spoilers 'till he's stuffed?

Kornel Kisielewicz
No problem feeding spoilers, really, but the manual labor required in creating a corpus of the information in spoilers is a lot. I mean, the Angband bot uses spoilers significantly. Really the result is the only important thing, but even with all the spoilers in the world that sure is a lot of rules to create.Also, I disagree that actions in NetHack are binary. Sometimes you don't even KNOW the effect of an action. I think you could collect a number of statistics about each action like: turns before death, number of HP in damage caused, etc.
danieltalsky
Also, in the case of eating the cockatrice... it causes immediate death, and most of these actions would quickly bubble to the bottom of "actions that would help survival".
danieltalsky
Unfortunately Angband is a lot more "ordered" than NetHack -- the rules in Angband are relatively simple, and there's not much hardcoded "special cases" (for which NetHack is famous ;>). I'll think about it more.
Kornel Kisielewicz
It seems to me that the big problem with this sort of thing is the huge variance in starting states. It would certainly be a lot easier if the bot could "cheat" by reloading during the learning stages.
Anon.
Of course, Anon, but I think that's why Bayesian is such a cool model. I'm thinking that you could just statistically analyze certain actions and results. Even things you do that kill you the first time you try them might only kill you because it happens to be a full moon (yes, moon phase is kept track of in NetHack) and other other conditions might be beneficial. That way you could compare statistical benefits to statistical harms for deciding the next best likely thing to do.
danieltalsky
Why do I feel this is going to be one unhappy bot...
Kornel Kisielewicz
Nethack not probabilistic? What are you smocking? There is plenty of non-determinism in nethack rules, the most important kind of which is unidentified items. It might be an amulet of live saving, but just as well an amulet of strangulation. Do you really argue that because the bot got an amulet of live saving the first time, he should keep putting on every amulet he finds?
meriton
+3  A: 

There is precedent: the monstrous rog-o-matic program succeeded in playing rogue and even returned with the amulet of Yendor a few times. Unfortunately, rogue was only released an a binary, not source, so it has died (unless you can set up a 4.3BSD system on a MicroVAX), leaving rog-o-matic unable to play any of the clones. It just hangs cos they're not close enough emulations.

However, rog-o-matic is, I think, my favourite program of all time, not only because of what it achieved but because of the readability of the code and the comprehensible intelligence of its algorithms. It used "genetic inheritance": a new player would inherit a combination of preferences from a previous pair of successful players, with some random offset, then be pitted against the machine. More successful preferences would migrate up in the gene pool and less successful ones down.

The source can be hard to find these days, but searching "rogomatic" will set you on the path.

martinwguy
My friend appreciated being pointed in the direction of Rog-o-matic. I'm still holding out for someone to give their input on the suitability of a bayesian algorithm for this purpose.
danieltalsky
@martinwguy - rog-o-matic works nowdays, thanks to the Roguelike Restoration project - http://rogue.rogueforge.net/
Kornel Kisielewicz
+3  A: 

Although Bayesian analysis encompasses much more, the Naive Bayes algorithm well known from spam filters is based on one very fundamental assumption: all variables are essentially independent of each other. So for instance, in spam filtering each word is usually treated as a variable so this means assuming that if the email contains the word 'viagra', that knowledge does affect the probability that it will also contain the word 'medicine' (or 'foo' or 'spam' or anything else). The interesting thing is that this assumption is quite obviously false when it comes to natural language but still manages to produce reasonable results.

Now one way people sometimes get around the independence assumption is to define variables that are technically combinations of things (like searching for the token 'buy viagra'). That can work if you know specific cases to look for but in general, in a game environment, it means that you can't generally remember anything. So each time you have to move, perform an action, etc, its completely independent of anything else you've done so far. I would say for even the simplest games, this is a very inefficient way to go about learning the game.

I would suggest looking into using q-learning instead. Most of the examples you'll find are usually just simple games anyway (like learning to navigate a map while avoiding walls, traps, monsters, etc). Reinforcement learning is a type of online unsupervised learning that does really well in situations that can be modeled as an agent interacting with an environment, like a game (or robots). It does this trying to figure out what the optimal action is at each state in the environment (where each state can include as many variables as needed, much more than just 'where am i'). The trick then is maintain just enough state that helps the bot make good decisions without having a distinct point in your state 'space' for every possible combination of previous actions.

To put that in more concrete terms, if you were to build a chess bot you would probably have trouble if you tried to create a decision policy that made decisions based on all previous moves since the set of all possible combinations of chess moves grows really quickly. Even a simpler model of where every piece is on the board is still a very large state space so you have to find a way to simplify what you keep track of. But notice that you do get to keep track of some state so that your bot doesn't just keep trying to make a left term into a wall over and over again.

The wikipedia article is pretty jargon heavy but this tutorial does a much better job translating the concepts into real world examples.

The one catch is that you do need to be able to define rewards to provide as the positive 'reinforcement'. That is you need to be able to define the states that the bot is trying to get to, otherwise it will just continue forever.

Rob Van Dam
+1  A: 

I doubt bayesian analysis will get you far because most of NetHack is highly contextual. There are very few actions which are always a bad idea; most are also life-savers in the "right" situation (an extreme example is eating a cockatrice: that's bad, unless you are starving and currently polymorphed into a stone-resistant monster, in which case eating the cockatrice is the right thing to do). Some of those "almost bad" actions are required to win the game (e.g. coming up the stairs on level 1, or deliberately falling in traps to reach Gehennom).

What you could try would be trying to do it at the "meta" level. Design the bot as choosing randomly among a variety of "elementary behaviors". Then try to measure how these bots fare. Then extract the combinations of behaviors which seem to promote survival; bayesian analysis could do that among a wide corpus of games along with their "success level". For instance, if there are behaviors "pick up daggers" and "avoid engaging monsters in melee", I would assume that analysis would show that those two behaviors fit well together: bots which pick daggers up without using them, and bots which try to throw missiles at monsters without gathering such missiles, will probably fare worse.

This somehow mimics what learning gamers often ask for in rec.games.roguelike.nethack. Most questions are similar to: "should I drink unknown potions to identify them ?" or "what level should be my character before going that deep in the dungeon ?". Answers to those questions heavily depend on what else the player is doing, and there is no good absolute answer.

A difficult point here is how to measure the success at survival. If you simply try to maximize the time spent before dying, then you will favor bots which never leave the first levels; those may live long but will never win the game. If you measure success by how deep the character goes before dying then the best bots will be archeologists (who start with a pick-axe) in a digging frenzy.

Thomas Pornin
+1  A: 

Apparently there are a good number of Nethack bots out there. Check out this listing:

davr