views:

777

answers:

8

I was wondering if it is possible to generate a "core" file, copy if to another machine and then continue execution of the a core file on that machine?

I have seen the gcore utility that will make a core file from a running process. But I do not think gdb can continue execution based on a core file.

Is there any way to just dump the heap/stack and and restore those at a later point?

+5  A: 

it's called process migration.

mosix and OpenMosix used to be able to do that. nowadays it's easiest to migrate a whole VM.

Javier
A: 

I don't believe this is possible. However, you might want to look into virtualization software - e.g. Xen - which make it possible to freeze and move entire system images fromone machine to another.

Avdi
+1  A: 

This won't, in general, be sufficient to let an arbitrary process continue on another machine. In addition to the heap and stack state, there may also also open I/O handles, allocated hardware resources, etc. etc.

Your options are either to explicitly write your software in a way that lets it dump state on a signal and later resume from the dumped state, or to run your software in a virtual machine and migrate that to the alternate host - Xen and Vmware both support freeze/restore as well as live migration.

That said, CryoPID attempts to do precisely this and occasionally succeeds.

moonshadow
+3  A: 

On modern systems, not from a core file, no you can't. For freezing and restoring an individual process on Linux, CryoPID and the new Kernel-based checkpoint and restart are in the works, but their abilities are currently quite limited. OpenVZ and other virtualization-like softwares can freeze and restore an entire system.

ephemient
A: 

In some cases, this can be done. For example, part of the Emacs build process is to load up all the Lisp libraries and then dump the memory image on disk for quick loading. Some other language interpreters do that too (I'm thinking of Lisp and Scheme implementations, mostly). However, they're specially designed for that kind of use, so I don't know what special things they have to do to allow that to work.

I think this would be very hard to do for a random program, but if you wrote a framework where all objects supported serialisation/deserialisation, you can then serialise all objects used by your program, and then ship that elsewhere, and deserialise them at the other end.

The other people's answers about virtualisation are on the spot, too.

Chris Jester-Young
A: 

Depends on the machine. It's very doable in a very small embedded system, for instance. I think it's also implemented somewhat in Beowulf clusters and other supercomputeresque apps.

Paul Nathan
+3  A: 

Also checkout out the Condor project. Condor can do that with parallel jobs as well. Condor also include monitors that can automatically migrate your process when some, for example, starts using their workstation again. It's really designed for utilizing spare cycles in networked environments.

Pat Notz
A: 

There are lots of reasons you can't do what you want very easily. For example, when you restore the core file on the other machine how do you resolve file descriptors that you process had open? What about sockets, named pipes, semaphores, or any other OS-level resource? Basically unless your system is specifically designed to handle such an operation you can't naively dump a core file and move it to another machine.