views:

838

answers:

8

Is it possible to pause a process, save the memory contents to a file, and then later reload the file so you can continue the program?

Edit I've been reading about this:

http://en.wikipedia.org/wiki/Setcontext

Is it possible to dump the contents of the struct, and somehow force malloc to allocate the same memory regions?

+6  A: 

Technically it is possible, but it would require saving all the system-allocated resources state too - like file desciptors for example and then restoring them. So it's a challenging task.

The easiest way to achieve what you want is to use a virtual machine like VMWare. When you pause it you actually save the whole machine state together with all programs running.

sharptooth
And VMWare makes sure that other resources like network interfaces are restarted properly.
Aaron Digulla
It's not only challanging, but, in general case, impossible. See http://blogs.msdn.com/oldnewthing/archive/2004/04/20/116749.aspx
GSerg
@GSerg, that link presupposes that resources outside the process will be released when it's hibernated. That doesn't have to be the case in a situation where the process is just stashed on a do-not-run queue and it's address space sent to disk without relinquishing exo-process resources.
paxdiablo
I can envisage this as a simple extension to the UNIX SIGSTOP/SIGCONT - it doesn't relinquish those resources now and it could be changed to swap whole address space to disk. It wouldn't be possible to survive a machine restart but you could achieve the desired effect.
paxdiablo
That effect being able to stop and start processes willy-nilly. Of course, you have the problem of a resource being locked while the process is stopped but you could have tools to tell you that (and you would then restart the process to release it).
paxdiablo
A: 

Workflow Foundation in .NET 3.0 and higher allows for workflows to be stopped and restarted.

Jonathan Parker
+1  A: 

Well java has serialization and it comes somewhere near to it. Though you can't do it to the lowest level like CPU registers memory address etc since this will require os to be in same state that was when you 'paused' the process.

This can be a good project as a linux kernel module :-)

Xolve
A: 

It's messy to the point of being impossible when dealing with native code, as sharptooth mentions.

However, some programs (iirc emacs, for instance) have used "dump my own memory" tricks to preserve configuration, instead of dealing with config files. This doesn't work on Windows, though, since executables are run in deny-write share mode. But it's a cute (albeit dangerous) trick on linux or DOS :)

snemarch
+2  A: 

This is usually called a persistent continuation (http://en.wikipedia.org/wiki/Continuation). Some languages like SmallTalk and SBCL have first class support for persistent continuations. Most languages don't.

Guillaume
Continuations structure the control flow inside a program, they don't stop or start the programs execution.
sth
@sth Some Smalltalk runtimes and SBCL support persistent continuations, rather than just the transient ones you're thinking of.
Pete Kirkham
+1  A: 

You get a similar effect in UNIX when you SIGSTOP (CTRL-Z, although this depends on your stty settings) a process. I believe it stops the process from executing totally and the address space, while not swapped to disk en masse, can be paged out (or discarded if it's reloadable from the executable) due to non-use.

SIGCONT will restart the process.

One possible solution is something we implemented in UNIX a while back. Make the program responsible for its own start and stop, by passing a command-line parameter to it. Secondary copies of the program would not do any real work, they'd only communicate to the primary copy.

So there were four possibilities:

  • Run primary copy with 'resume' or no parameter. This would just start the program and let it run.
  • Run primary copy with 'pause'. This would start the program which would initialize then send a SIGSTOP signal to itself to halt.
  • Run secondary copy with 'resume'. This would send a SIGCONT signal to the primary copy to start it running again. Secondary copies with no parameter would simply complain that a primary was already running then exit.
  • Run secondary copy with 'pause'. This would send a SIGSTOP signal to the primary copy to make it pause.

We used shared memory for communicating with the primary since there were many other things a secondary could pass to the primary. But if all you want is stop and start instructions, I'd go for storing the PID in a known file and checking to make sure the process name is valid.

This gives you a nice idiom for a program that can pause and restart just by seemingly running it again with different parameters.

paxdiablo
Is there a way to do this in a programming language without the OS?
Unknown
I've never heard of one, but there's an awful lot of languages around that I know little about.
paxdiablo
@unknown: On Unix, a process can send itself SIGSTOP. The question is how do you awake it later? It can't send itself SIGCONT for obvious reasons...
Aaron Digulla
+2  A: 

Depending on your requirements and OS you could try forcing a core dump

I have never tried actually loading a core dumped program back up other than in gdb. It seems like any files you have open or any other state that is not in your programs memory would be lost as sharptooth pointed out.

Another approach would be simply serializing the state you need to disk in your program. It sucks but it is probably the most reliable way unless you are content with suspending execution of the program. That could be done with your operating system's thread library. Or as one poster pointed out with your shell.

fuzzy-waffle
A: 

Raymond Chen explains why it's impossible. Obviously, not all Microsoft engineers read this, because the Visual Studio compiler does this when precompiling headers. Its dumps its own state after compiling the headers for the first time, and restores itself to continue.

MSalters
Mr Chen is right and wrong. It's only impossible if the outside-process resources disappear. This doesn't have to happen. You can stop a process totally while preserving the out-of-process resources - obviously that won't survive a reboot but that's not necessarily what's needed here.
paxdiablo
The ability to stop a process totally for an hour while you do some CPU-intensive work is still valuable. Then you just restart that process. SIGSTOP/SIGCONT already does this on UNIX et al.
paxdiablo
Actually, for that it's sufficient to lower the priority to idle. No harm in using the last few % of the last core, many parallel algorithms can't sustain full parallelism over the entire operation - especially with trees.
MSalters