views:

900

answers:

5

I have a PHP application which needs to output a python script, more specifically a bunch of variable assignment statements, eg.

subject_prefix = 'This String From User Input'
msg_footer = """This one too."""

The contents of subject_prefix et al need to be written to take user input; as such, I need to escape the contents of the strings. Writing something like the following isn't going to cut it; we're stuffed as soon as someone uses a quote or newline or anything else that I'm not aware of that could be hazardous:

echo "subject_prefix = '".$subject_prefix."'\n";

So. Any ideas?

(Rewriting the app in Python isn't possible due to time constraints. :P )

Edit, years later:

This was for integration between a web-app (written in PHP) and Mailman (written in Python). I couldn't modify the install of the latter, so I needed to come up with a way to talk in its language to manage its configuration.

This was also a really bad idea.

A: 

I suggest writing a function that will take two arguments: the text to be escaped and the type of quotes the string is in. Then, for example, if the type of quotes are single quotes, the function will escape the single quotes in the string and any other characters that need to be escaped (backslash?).

function escape_string($text, $type) {
    // Escape backslashes for all types of strings?
    $text = str_replace('\\', '\\\\', $text);

    switch($type) {
        case 'single':
            $text = str_replace("'", "\\'", $text);
            break;
        case 'double':
            $text = str_replace('"', '\\"', $text);
            break;
        // etc...
    }

    return $text;
}

I'm assuming that for single-quoted strings you want to escape the single quotes, and with double-quoted strings you want to escape the double quotes...

yjerem
Is there anything else that can "break" a python string? New-lines?
Rob Howard
I'm not sure, I actually don't know Python very well.
yjerem
A: 

I'd start by standardizing the string type I was using in python, to use triple-quoted strings ("""). This should reduce the incidents of problems from stray quotes in the input. You'll still need to escape it of course, but it should reduce the number of issues that are a concern.

What I did to escape the strings would somewhat depend on what I'm worried about getting slipped in, and the context that they are getting printed out again. If you're just worried about quotes causing problems, you could simply check for and occurrences of """ and escape them. On the other hand if I was worried about the input itself being malicious (and it's user input, so you probably should), then I would look at options like strip_tags() or other similar functions.

acrosman
A: 

Another option may be to export the data as array or object as JSON string and modify the python code slightly to handle the new input. While the escaping via JSON is not 100% bulletproof it will be still better than own escaping routines.

And you'll be able to handle errors if the JSON string is malformatted.

There's a package for Python to encode and decode JSON: python-json 3.4

Joe Scylla
+2  A: 

Do not try write this function in PHP. You will inevitably get it wrong and your application will inevitably have an arbitrary remote execution exploit.

First, consider what problem you are actually solving. I presume you are just trying to get data from PHP to Python. You might try to write a .ini file rather than a .py file. Python has an excellent ini syntax parser, ConfigParser. You can write the obvious, and potentially incorrect, quoting function in PHP and nothing serious will happen if (read: when) you get it wrong.

You could also write an XML file. There are too many XML parsers and emitters for PHP and Python for me to even list here.

If I really can't convince you that this is a terrible, terrible idea, then you can at least use the pre-existing function that Python has for doing such a thing: repr().

Here's a handy PHP function which will run a Python script to do this for you:

<?php

function py_escape($input) {
    $descriptorspec = array(
        0 => array("pipe", "r"),
        1 => array("pipe", "w")
        );
    $process = proc_open(
        "python -c 'import sys; sys.stdout.write(repr(sys.stdin.read()))'",
        $descriptorspec, $pipes);
    fwrite($pipes[0], $input);
    fclose($pipes[0]);
    $chunk_size = 8192;
    $escaped = fread($pipes[1], $chunk_size);
    if (strlen($escaped) == $chunk_size) {
        // This is important for security.
        die("That string's too big.\n");
    }
    proc_close($process);
    return $escaped;
}

// Example usage:
$x = "string \rfull \nof\t crappy stuff";
print py_escape($x);

The chunk_size check is intended to prevent an attack whereby your input ends up being two really long strings, which look like ("hello " + ("." * chunk_size)) and '; os.system("do bad stuff") respectively. Now, that naive attack won't work exactly, because Python won't let a single-quoted string end in the middle of a line, and those quotes in the system() call will themselves be quoted, but if the attacker manages to get a line continuation ("\") into the right place and use something like os.system(map(chr, ...)) then they can inject some code that will run.

I opted to simply read one chunk and give up if there was more output, rather than continuing to read and accumulate, because there are also limits on Python source file line length; for all I know, that could be another attack vector. Python is not intended to be secure against arbitrary people writing arbitrary source code on your system so this area is unlikely to be audited.

The fact that I had to think of all this for this trivial example is just another example of why you shouldn't use python source code as a data interchange format.

Glyph
Nono, you've convinced me. :)This approach should've raised all sorts of red flags in my head when I first thought of it.I'm curious as to why the function above checks that the chunk size is equal to the length of the string; could you explain that part a little further?
Rob Howard
I've edited it to add a nice long explanation; hope that helps.
Glyph
A: 

I needed to code this to escape a string in the "ntriples" format, which uses python escaping.

The following function takes a utf-8 string and returns it escaped for python (or ntriples format). It may do odd things if given illegal utf-8 data. It doesn't understand about Unicode characters past xFFFF. It does not (currently) wrap the string in double quotes.

The uniord function comes from a comment on php.net.

    function python_string_escape( $string )
    {
            $string = preg_replace( "/\\\\/", "\\\\", $string ); # \\ (first to avoid string re-escaping)
            $string = preg_replace( "/\n/", "\\n", $string ); # \n
            $string = preg_replace( "/\r/", "\\r", $string ); # \r 
            $string = preg_replace( "/\t/", "\\t", $string ); # \t 
            $string = preg_replace( "/\"/", "\\\"", $string ); # \"
            $string = preg_replace(
                            "/([\x{00}-\x{1F}]|[\x{7F}-\x{FFFF}])/ue",
                            "sprintf(\"\\u%04X\",uniord(\"$1\"))",
                            $string );
            return $string;
    }

    function uniord($c) {
            $h = ord($c{0});
            if ($h <= 0x7F) {
                    return $h;
            } else if ($h < 0xC2) {
                    return false;
            } else if ($h <= 0xDF) {
                    return ($h & 0x1F) << 6 | (ord($c{1}) & 0x3F);
            } else if ($h <= 0xEF) {
                    return ($h & 0x0F) << 12 | (ord($c{1}) & 0x3F) << 6
                                             | (ord($c{2}) & 0x3F);
            } else if ($h <= 0xF4) {
                    return ($h & 0x0F) << 18 | (ord($c{1}) & 0x3F) << 12
                                             | (ord($c{2}) & 0x3F) << 6
                                             | (ord($c{3}) & 0x3F);
            } else {
                    return false;
            }
    }
Christopher Gutteridge