views:

1138

answers:

5

My friend produced a small proof-of-concept assembler that worked on x86. I decided to port it for x86_64 as well, but I immediately hit a problem.

I wrote a small piece of program in C, then compiled and objdumped the code. After that I inserted it to my python script, therefore the x86_64 code is correct:

from ctypes import cast, CFUNCTYPE, c_char_p, c_long

buffer = ''.join(map(chr, [ #0000000000000000 <add>:
  0x55,                     # push   %rbp
  0x48, 0x89, 0xe5,         # mov    %rsp,%rbp
  0x48, 0x89, 0x7d, 0xf8,   # mov    %rdi,-0x8(%rbp)
  0x48, 0x8b, 0x45, 0xf8,   # mov    -0x8(%rbp),%rax
  0x48, 0x83, 0xc0, 0x0a,   # add    $0xa,%rax
  0xc9,                     # leaveq 
  0xc3,                     # retq
]))

fptr = cast(c_char_p(buffer), CFUNCTYPE(c_long, c_long))
print fptr(1234)

Now, why does this script keeps doing segmentation fault whenever I run it?

I have yet a question about mprotect and no execution flag. It is said to protect against most basic security exploits like buffer overruns. But what is the real reason it's in use? You could just keep on writing until you hit the .text, then inject your instructions into a nice, PROT_EXEC -area. Unless, of course, you use a write protection in .text

But then, why have that PROT_EXEC everywhere anyway? Wouldn't it just help tremendously that your .text section is write protected?

A: 

Does python even allow such usage? I should learn it then...

I think the interpreter doesn't expect any register to be changed. Try saving the registers that you use inside the function if you plan to use your assembler output like this.

Btw, call convention of x86_64 is different than regular x86. You may have trouble if you lose stack pointer alignment and mix external objects generated with other tools.

artificialidiot
ctypes takes care that my calling conventions are correct, it is enough that the code has been outputted by gcc. As what comes to changing registers, I thought the calling conventions of x86_64 are saying that subroutine can change most registers freely.
Cheery
+2  A: 

I think you can't freely execute any allocated memory without first setting it as executable. I never tried myself, but you might want to check the unix function mprotect:

http://linux.about.com/library/cmd/blcmdl2_mprotect.htm

VirtualProtect seems to do the same thing on windows :

http://msdn.microsoft.com/en-us/library/aa366898(VS.85).aspx

vincent
Even though I found it elsewhere before, this is really correct, but with slight variation. I explain it in my own answer.
Cheery
+3  A: 

Done some research with my friend and found out this is a platform-specific issue. We suspect that on some platforms malloc mmaps memory without PROT_EXEC and on others it does.

Therefore it is necessary to change the protection level with mprotect afterwards.

Lame thing, took a while to find out what to do.

from ctypes import (
    cast, CFUNCTYPE, c_long, sizeof, addressof, create_string_buffer, pythonapi
)

PROT_NONE, PROT_READ, PROT_WRITE, PROT_EXEC = 0, 1, 2, 4
mprotect = pythonapi.mprotect

buffer = ''.join(map(chr, [ #0000000000000000 <add>:
    0x55,                     # push   %rbp
    0x48, 0x89, 0xe5,         # mov    %rsp,%rbp
    0x48, 0x89, 0x7d, 0xf8,   # mov    %rdi,-0x8(%rbp)
    0x48, 0x8b, 0x45, 0xf8,   # mov    -0x8(%rbp),%rax
    0x48, 0x83, 0xc0, 0x0a,   # add    $0xa,%rax
    0xc9,                     # leaveq 
    0xc3,                     # retq
]))

pagesize = pythonapi.getpagesize()
cbuffer = create_string_buffer(buffer)#c_char_p(buffer)
addr = addressof(cbuffer)
size = sizeof(cbuffer)
mask = pagesize - 1
if mprotect(~mask&addr, mask&addr + size, PROT_READ|PROT_WRITE|PROT_EXEC) < 0:
    print "mprotect failed?"
else:
    fptr = cast(cbuffer, CFUNCTYPE(c_long, c_long))
    print repr(fptr(1234))
Cheery
absolutely best example ever seen on this topic!
mtasic
+2  A: 

As vincent mentioned, this is due to the allocated page being marked as non executable. Newer processors support this functionality, and its used as an added layer of security by OS's which support it. The idea is to protect against certain buffer overflow attacks. Eg. A common attack is to overflow a stack variable, rewriting the return address to point to code you have inserted. With a non-executable stack this now only produces a segfault, rather than control of the process. Similar attacks also exist for heap memory.

To get around it, you need to alter the protection. This can only be performed on page aligned memory, so you'll probably need to change your code to something like the below:

libc = CDLL('libc.so')

# Some constants
PROT_READ = 1
PROT_WRITE = 2
PROT_EXEC = 4

def executable_code(buffer):
    """Return a pointer to a page-aligned executable buffer filled in with the data of the string provided.
    The pointer should be freed with libc.free() when finished"""

    buf = c_char_p(buffer)
    size = len(buffer)
    # Need to align to a page boundary, so use valloc
    addr = libc.valloc(size)
    addr = c_void_p(addr)

    if 0 == addr:  
        raise Exception("Failed to allocate memory")

    memmove(addr, buf, size)
    if 0 != libc.mprotect(addr, len(buffer), PROT_READ | PROT_WRITE | PROT_EXEC):
        raise Exception("Failed to set protection on buffer")
    return addr

code_ptr = executable_code(buffer)
fptr = cast(code_ptr, CFUNCTYPE(c_long, c_long))
print fptr(1234)
libc.free(code_ptr)

Note: It may be a good idea to unset the executable flag before freeing the page. Most C libraries don't actually return the memory to the OS when done, but keep it in their own pool. This could mean they will reuse the page elsewhere without clearing the EXEC bit, bypassing the security benefit.

Also note that this is fairly non-portable. I've tested it on linux, but not on any other OS. It won't work on windows, buy may do on other unixes (BSD, OsX?).

Brian
Even better answer. valloc is useful, like is notice that EXEC bit doesn't get cleared after this. But I'm perhaps not interested about neither aspect.
Cheery
A: 

There's simpler approach I've figured only but recently that doesn't involve mprotect. Plainly mmap the executable space for program directly. These days python has a module for doing exactly this, though I didn't find way to get the address of the code. In short you'd allocate memory calling mmap instead of using string buffers and setting the execution flag indirectly. This is easier and safer, you can be sure only your code can be executed now.

Cheery