views:

219

answers:

4

I'm looking for a way to restart a thread, either from inside that thread's context or from outside the thread, possibly from within another process. (Any of these options will work.) I am aware of the difficulty of hibernating entire processes, and I'm pretty sure that those same difficulties attend to threads. However, I'm asking anyway in the hopes that someone has some insight.

My goal is to pause, save to file, and restart a running thread from its exact context with no modification to that thread's code, or rather, modification in only a small area - i.e., I can't go writing serialization functions throughout the code. The main block of code must be unmodified, and will not have any global/system handles (file handles, sockets, mutexes, etc.) Really down-and-dirty details like CPU registers do not need to be saved; but basically the heap, stack, and program counter should be saved, and anything else required to get the thread running again logically correctly from its save point. The resulting state of the program should be no different, if it was saved or not.

This is for a debugging program for high-reliability software; the goal is to run simulations of the software with various scripts for input, and be able to pause a running simulation and then restart it again later - or get the sim to a branch point, save it, make lots of copies and then run further simulations from the common starting point. This is why the main program cannot be modified.

The main thread language is in C++, and should run on Windows and Linux, however if there is a way to only do this on one system, then that's acceptable too.

Thanks in advance.

A: 

As the whole logical address space of the program is part of the thread's context, you would have to hibernate the whole process.

If you can guarantee that the thread only uses local variables, you could save its stack. It is easy to suspend a thread with pthreads, but I don't see how you could access its stack from outside then.

ypnos
+2  A: 

Threads run in the context of a process. So if you want to do anything like persist a thread state to disk, you need to "hibernate" the entire process.

You will need to serialise the entire set of the processes data. And you'll need to store the current thread execution point. I think serialising the process is do-able (check out boost::serialize) but the thread stop point is a lot more difficult. I would put places where it can be stopped through the code, but as you say, you cannot modify the code.

Given that problem, you're looking at virtualising the platform the app is running on, and using its suspend functionality to pause the entire thing. You might find more information about how to do this in the virtualisation vendor's features, eg Xen.

gbjbaanb
+2  A: 

I think what you're asking is much more complicated than you think. I am not too familiar with Windows programming but here are some of the difficulties you'll face in Linux.

A saved thread can only be restored from the root process that originally spawned the thread, otherwise the dynamic libraries would be broken. Because of this saving to disk is essentially meaningless. The reason is dynamic libraries are loaded at different address each time they're loaded. The only way around this would be to take complete control of dynamically linking, no small feat. It's possible, but pretty scary.

The suspended thread will have variables in the the heap. You'd need to be able to find all globals 'owned' by the thread. The 'owned' state of any piece of the heap cannot be determined. In the future it may be possible with the C++0x's garbage collection ABI. You can't just assume the whole stack belongs to the thread to be paused. The main thread uses the heap when creating threads. So blowing away the heap when deserializing the paused thread would break the main thread.

You need to address the issues with globals. And not just the globals from created in the threads. Globals (or statics) can and often are created in dynamic libraries.

There are more resources to a program than just memory. You have file handles, network sockets, database connections, etc. A file handle is just a number. serializing its memory is completely meaningless without the context of the process the file was opened in.

All that said. I don't think the core problem is impossible, just that you should consider a different approach.

Anyway to try to implement this the thread to paused needs to be in a known state. I imagine the thread to be stoped would call a library function meant the halt the process so it could be resumed.

I think the linux system call fork is your friend. Fork perfectly duplicates a process. Have the system run to the desired point and fork. One fork wait to fork others. The second fork runs one set of input.

once it completes the first fork can for again. Again the second fork can run another set of input.

continue ad infinitum.

caspin
A: 

The way you would have to do this is via VM Snapshots; get a copy of VMWare Workstation, then you can write code to automate starting/stopping/snapshotting the machine at different points. Any other approach is pretty untenable, as while you might be able to freeze and dethaw a process, you can't reconstruct the system state it expects (all the stuff that Caspin mentions like file handles et al.)

Paul Betts