I'm surprised nobody has mentioned Java hardware. It should be an inspiration for us to further the evolution of hardware by creating an even higher level processor.
There's another project I just found called "Pycorn".
If there was a Python bytecode processor, it would be feasible to make a fast operating system in 100% Python. The processor could implement the entire CPython bytecode, or anything that is compatible with the Python language (But not C modules!). The processor would have to handle reference counting, classes, and objects. Native hashing for dicts would be very helpful, all the complex data structure manipulations which Python currently needs in software should be done purely in hardware. There would be no concept of pointers, which I see as a prime motivation for building such a processor, as it would be impossible to smash the stack.
Everything would be objects! The kernel itself would call methods on the memory object, although you wouldn't need to touch it much since the hardware will handle allocation and garbage collection anyways. Interrupt handlers can simply be set to python methods. MSRs, caches, debug registers, and I/O ports are objects.
There's a interesting discussion about implementing Python on an FPGA here.
On another note, (pertaining to a Python O/S on a non-Python processor) to the people claiming you can't make inline assembly Pythonic, it's pretty simple to just emit assembly from an abstraction, ex:
asm = MetaASM()
asm.r1 = 1234
asm.r2 = r1 + 5
asm.io.out(r1)
You could switch to architecture specific assembly for performance needs or architecture specific operations / registers when necessary:
asm = ASM("IA-32")
asm.xor(asm.eax, asm.eax)
asm.cr0 = asm.eax
asm.invtlb
asm.fs.0x00123456 = asm.eax
asm.al = 123
asm.dword.ptr.eax = 1234 # mov dword ptr [eax], 1234
asm.push(asm.eax)
CorePy comes to interest on this topic.