views:

243

answers:

6

I have a Game framework where there is a list of Bots who implement IBotInterface. These bots are custom made by user with the only limitation that they must implement the interface.

The game then calls methods in the bots (hopefully in parallel) for various events like yourTurn and roundStart. I want the bot to only spend a limited amount of time handling these events before being forced to quit computing.

An example of the kind of thing i'm trying is : (where NewGame is a delegate)

Parallel.ForEach(Bots, delegate(IBot bot)
                {
                    NewGame del = bot.NewGame;
                    IAsyncResult r = del.BeginInvoke(Info, null, null);
                    WaitHandle h = r.AsyncWaitHandle;
                    h.WaitOne(RoundLimit);
                    if (!r.IsCompleted)
                    {
                        del.EndInvoke(r);
                    }
                }
            );

In this case I am forced to run EndInvoke() which might not terminate. I cant think of a way to cleanly abort the thread.

It would be great if there was some kind of

try { 
 bot.NewGame(Info);
} catch (TimeOutException) {
 // Tell bot off.
} finally {
 // Compute things.
}

But I dont think its possible to make such a construct.

The purpose of this is to gracefully handle AI's who have accidental infinite loops or are taking a long time to compute.

Another possible way of tackling this would be to have something like this (with more c# and less pseudocode)

Class ActionThread {
    pulbic Thread thread { get; set; }
    public Queue<Action> queue { get; set; }

    public void Run() {
        while (true) {
            queue.WaitOne();
            Act a = queue.dequeue();
            a();
        }
    }

Class foo {
    main() {
        ....
        foreach(Bot b in Bots) {
            ActionThread a = getActionThread(b.UniqueID);
            NewGame del = b.NewGame;
            a.queue.queue(del);
        }
        Thread.Sleep(1000);
        foreach (ActionThread a in Threads) {
            a.Suspend();
        }
    }
}

Not the cleanest way but it would work. (I'll worry about how to pass paramaters in and take return values out later).

[Further Edit]

I'm not too sure what an appdomain is, from the looks of it I could do so, but it cant see how it would help

I hope not to expect malicious code. Trying to kill the other bots threads is not a valid way to win the game. I just wanted to give every bot a second to compute then move on with the gameflow so mainly expecting slow or bugged code here.

I'm trying to see what I can do with Task, getting somewhere slowly.

I'll read into what CAS can do, thank you guys

[More Edit]

My head hurts, I can't seem to think or code anymore. I'm setting up a message pass system to a dedicated thread per bot and will suspend/sleep those threads

I have decided that I will use a fully socketed server client system. So that the client can do whatever it wants and I'll just ignore if it refuses to reply to server messages. Pity it had to come to this.

+3  A: 

A .NET 4.0 Task gives you considerably flexibility with regard to cancelling.

Run the delegate in a Task that you've passed a CancelationToken into, like this.

There's a fuller example here.

edit

In light of the feedback, I believe that the right answer is to follow this plan:

  1. Limit the AI using CAS, so that it can't access threading primitives or mess with files.

  2. Give it a heartbeat callback that it is morally obligated to call from inside long loops. This callback will throw an exception if the thread has taken too long.

  3. If the AI times out, log a weak move for it, and give it a short amount of time to call that callback. If it doesn't, just put the thread to sleep until its next turn. If it never does snap out of it, it will keep logging weak moves until the process kills it for being a background thread.

  4. Profit!

Steven Sudit
That only works if the code that runs in the task is actually looking at that token once in a while. If it doesn't, it will still go into an infinite loop, and no matter how nicely you ask the task to exit, the code running inside it will still continue running.
Lasse V. Karlsen
Doesn't help, cancellation is still voluntary by the task.
Hans Passant
Ok, I'll modify my answer to: Run the delegate on a separate machine, whose power supply is attached to an IP-addressable power strip. If it does not respond before the timeout, perform a hard reset. Better?
Steven Sudit
I dont think I have to go that far. I'm trying to avoid making the game a server and forcing all bots to connect as remote hosts and getting them to compute on their own machines. (Debugging parallel code is bad enough)
Raynos
You think *that's* extreme? My next plan involved remote-controlling a killer robot that uses a chainsaw to destroy the server...
Steven Sudit
Prior to invoking a killer robot, I would simply send a nastygram to the CrackBerry of the IT admin on call. An irate admin is *at least* as dangerous as any killer robot...
Steven Sudit
+1  A: 

One option is certainly to spin up a new Thread yourself, rather than relying on BeginInvoke. You can implement the timeout via Thread.Join and (unceremoniously) kill the thread, if necessary, via Thread.Abort.

Daniel Pratt
Ok, but using `Thread.Abort` is not a good idea if you can possibly avoid it.
Steven Sudit
out of curiosity - what wrong with aborting your thread (and possibly catching `ThreadAbortException` to clean up) ?
liho1eye
Just search SO for thread.abort, you should get plenty of reading material. To sum it up, the thread can occur at any time, in relation to what the thread is doing, and it might not be prepared to deal with an exception at the current point of its execution path, simply because there is no normal way for an exception to occur at that point, so there's no failsafe. For instance, what if the exception occurs right inside a `finally{}` block, before it has a chance to close a stream object.
Lasse V. Karlsen
I concur that aborting the thread randomly is dangerous. If I could do that I would have done it.
Raynos
+7  A: 

Unfortunately there is no 100% safe way to terminate a thread cleanly, as you seem to want.

There are many ways you can try to do it though, but they all have some sideeffects and downsides that you might want to consider.

The only clean, safe, and approved, way to do this is to get the cooperation of the thread in question and ask it nicely. However, this is only a 100% guaranteed way if you're in control of the code. Since you're not, it won't be.

Here's the problem.

  • If you do a Thread.Abort, you risk leaving the appdomain in an unsafe state. There could be files left open, network or database connections left open, kernel objects left in an invalid state, etc.
  • Even if you place the thread into its own appdomain, and tear down the appdomain after aborting the thread, you can't be 100% safe that your process won't have problems in the future because of it.

Let's look at why cooperation isn't going to be 100% either.

Let's say the thread in question often needs to call into your library code, in order to draw on screen, or whatnot. You could easily place a check into those methods, and throw an exception.

However, that exception could be caught, and swallowed.

Or the code in question could go into an infinite loop that doesn't call your library code at all, which leaves you back to square one, how to cleanly kill the thread without its cooperation.

Which I already said you can't.

There is, however, one method that might work. You could spawn the bot into its own process, then kill the process when it times out. This would give you a higher chance of success because at least the operating system will take care of all resources it manages when the process dies. You can of course have the process leave corrupted files on the system, so again, it's not 100% clean.

Here's a blog entry by Joe Duffy that explains a lot about these problems: Managed code and asynchronous exception hardening.

Lasse V. Karlsen
Ok, so why not use `Task`?
Steven Sudit
Because that's just an object wrapper, you haven't solved the underlying problem that you can't cleanly terminate a thread in .NET. The only safe way is to ask the thread nicely, and leave it to exit itself. It it doesn't, or won't, do that, you have no options left.
Lasse V. Karlsen
I think we need to define the level of uncooperativeness we're expecting. Are we worried about malicious code or just slow code?
Steven Sudit
@Steven: OP says "gracefully handle AI's who have accidental infinite loops", so it might be necessary to re-write these looping parts of code so that they check every so often for a request to terminate themselves. Assuming that re-writing that section of code is posible.
FrustratedWithFormsDesigner
You can use CAS limit what the bots are allowed to do. This attribute for example will only allow the code to execute and does not allow it to access other system resources. [SecurityPermissionAttribute(SecurityAction.PermitOnly, Flags=SecurityPermissionFlag.Execution)] At least this way, you aren't as worried about leaving your system in a corrupted state.
Dave White
@Dave, I agree, I didn't think of that.
Lasse V. Karlsen
@FrustratedWithFormsDesigner rewriting the section of code is possible, But I can imagine AI can get pretty large and I don't want to debug other people's code just so it works elegantly on framework. This assumes their code is readable.
Raynos
@Raynos: Hmmm I see the problem is in the code of future users... It would be best to inject the listen-for-terminate code into the code in the user's loops...but how? Do these potential loops exist in methods that are specified in the interface and implemented by the user's code? Feels like this could be solved in an AOP-fashion, I'm just not sure how...
FrustratedWithFormsDesigner
@Raynos: I think that all but the most barbaric methods (IP-addressable power strips and the like) require *some* amount of cooperation. If you can have the AI code call a method inside their loops, this would give you an opportunity to force an exception to be thrown. With this proviso, it would be pretty easy to make their code haltable.
Steven Sudit
The bots are not meant to call any methods I can throw exceptions and resorting forcing something like that defeats the point of their implementation freedom.
Raynos
@Raynos: Requiring periodic calls to a heartbeat method is not unreasonable.
Steven Sudit
+1  A: 

As I mentioned on @Lasse's answer, you really should consider using Code Access Security to limit what the bots are allowed to do on the system. You can really restrict what the bots are allowed to do execution wise (including removing their ability to access threading APIs). It might be worth considering treating each bot as if it was a virus since you aren't in control of what it could do to your system and I'm assuming that your game will be running as a full-trust application.

That said, if you restrict what each bot does with CAS, you may be able to terminate the thread without fear of corrupting your system. If a bot is stuck in an infinite loop, I suspect that your only recourse will be to terminate the thread since it will, programatically, never be able to reach any code that would check for a Thread.Abort signal.

This MSDN article may give you some guidance on how to more effectively terminate a run-away thread using a TerminateThread function.

You really have to try all of these things out and prove to yourself which may be the best solution.

Alternatively, if this a competitive game and the bot authors have to play fair, have you considered having the game give each bot a "turn token" that the bot has to present with each action that it wants to invoke and when the game flips turns, it gives all the bots a new token. This may allow you to now only have to worry about run-away threads and not robots who take to long on their turn.

Edit

Just to add a place for people to go look Security Permission Flag Enumerations will give you a sense of where to start. If you remove the SecurityPermissionFlag.ControlThread permission (as an example, only giving the code permission to execute and excluding ControlThread) you will remove their

Ability to use certain advanced operations on threads.

I don't know the extent of the operations that are inaccessible, but it will be an interesting exercise for the weekend to figure that out.

Dave White
`Thread.Abort` will stop an infinite loop, though at the cost of destabilizing your `AppDomain`.
Steven Sudit
Using CAS to prevent them from accessing files or blocking on anything would certainly help, but it still requires their cooperation to *gently* end any loop that goes on too long.
Steven Sudit
Gently, yes. If you can remove the ability to acquire unmanaged resources or block on anything, though, `Thread.Abort` becomes less of an app killer.
cHao
I'm not sure whether I'd consider `Thread.Abort` "safe", even with full CAS restrictions.
Steven Sudit
Out of curiosity, how would you remove the ability for code to lock? `Monitor.Enter` (the method that actually acquires a lock) doesn't seem to have any security requirements that would apply in pure managed code.
cHao
@Steven: I didn't say "safe". I said "less of an app killer". :) It's still a potential app killer, just a bit less so if you remove the potential to acquire resources it can't clean up.
cHao
SecurityPermissionFlag.ControlThread is the permission that can be used to deny access to some of the Threading APIs. SecurityPermissionFlag.UnmanagedCode is the permission that can be used to deny an application the ability to use/access unmanaged code.
Dave White
Neither one seems able to prohibit a call to `Monitor.Enter`. The first just prevents access to *advanced* thread APIs. Locking is as basic and fundamental as it gets.
cHao
It can't take a lock it doesn't have a reference to. Use proxies so that no objects are shared between bots. Done.
Ben Voigt
@cHao: I think Ben is right. Without `ControlThread`, it can only hurt itself.
Steven Sudit
A: 

another design option may be to allow each bot to be its own process, then use an IPC mechanism to exchange data. You can usually terminate a process without any terrible repercussions.

boo
That just seems far too heavy weight for my liking
Raynos
This was already suggested.
Steven Sudit
I added a -1 because this is unoriginal.
Steven Sudit
+1  A: 

I think the only problem is an inverted conditional. You should call EndInvoke only if the task completed successfully.

Rough suggestion:

var goodBots = new List<IBot>();
var results = new IAsyncResult[Bots.Count];
var events = new WaitHandle[Bots.Count];
int i = 0;
foreach (IBot bot in Bots) {
    NewGame del = bot.NewGame;
    results[i] = del.BeginInvoke(Info, null, null);
    events[i++] = r.AsyncWaitHandle;
}
WaitAll(events, RoundLimit);
var goodBots = new List<IBot>();
for( i = 0; i < events.Count; i++ ) {
    if (results[i].IsCompleted) {
        goodBots.Add(Bots[i]);
        EndInvoke(results[i]);
        results[i].Dispose();
    }
    else {
        WriteLine("bot " + i.ToString() + " eliminated by timeout.");
    }
}
Bots = goodBots;

Even better would be to explicitly spin up a thread for each bot, that way you could suspend the misbehaving bots (or dial down their priority) so that they don't continue to steal CPU time from the good bots.

Ben Voigt
I love how you just overwrite the global bot list with the goodbots. Anyone that didnt return within the timelimit just left the game. You are forgetting that by calling BeginInvoke on a bad bot I am forced to call endinvoke or face leaks of resources and other nasty stuff. The concept was to actually have bad bots default to some (tactically weak) action.I think the easiest thing for me to do is suspend threads on time out and continue them on the next round. If the bot times out on every round thats not my fault
Raynos
@Raynos: That's very funny. Actually, I think it's a fair fallback if they refuse to cooperatively check for cancellation.
Steven Sudit
Suspending the threads, IIRC, leaves whatever they had locked locked. You'll need to prevent them from acquiring locks unless you want to freeze your app. And resuming the thread won't restart it; you'll just start it where it left off (presumably with out-of-date data and even objects that the runner isn't using anymore).
cHao
@cHao: Dave White already suggested blocking their access to synchronization primitives using CAS, so this is a solved problem.
Steven Sudit
@cHao: Holding a lock indefinitely won't freeze the app if nothing else waits on that lock. Don't give more than one bot a reference to the same object, and there won't be any problem.
Ben Voigt
@Ben Voight: Every bot in the app, as well as the app itself, has to have access to `typeof(IBot)`. Or `typeof(int)`, for that matter. If a bot is determined to cause chaos, it doesn't have to settle for the references you give it -- there's thousands of others just waiting to be abused.
cHao
@cHao: I think you missed Ben's point here. Even if Bot A locks on typeof(int), this is only a problem for any other bots that repeat this bit of idiocy. It will in no way affect the system as a whole.
Steven Sudit
Furthermore, I had considered both locks and leaks when constructing my solution. You can't "unleak" resources seized by a runaway bot, but you can prevent them from leaking more. CAS is a good addition which prevents the bot from leaking anything except its single thread. And by dialing down the priority without complete suspension, you give a bot which takes and releases a lock inside an infinite loop a chance to release it, which suspending wouldn't. But you also give it a chance to grab more locks, so suspending is probably best.
Ben Voigt
The bot can also leak memory (assign a static member field to point to some huge data structure) but that will get swapped out of the working set quickly enough, you just have to worry about address space exhaustion. But if the bot actually triggers an OOM, that might give you a way to clean up safely (you could prescan the bot code for exception handling regions and reject any bot that contains them, then you can catch the OOM).
Ben Voigt