Do not try write this function in PHP. You will inevitably get it wrong and your application will inevitably have an arbitrary remote execution exploit.
First, consider what problem you are actually solving. I presume you are just trying to get data from PHP to Python. You might try to write a .ini file rather than a .py file. Python has an excellent ini syntax parser, ConfigParser. You can write the obvious, and potentially incorrect, quoting function in PHP and nothing serious will happen if (read: when) you get it wrong.
You could also write an XML file. There are too many XML parsers and emitters for PHP and Python for me to even list here.
If I really can't convince you that this is a terrible, terrible idea, then you can at least use the pre-existing function that Python has for doing such a thing: repr()
.
Here's a handy PHP function which will run a Python script to do this for you:
<?php
function py_escape($input) {
$descriptorspec = array(
0 => array("pipe", "r"),
1 => array("pipe", "w")
);
$process = proc_open(
"python -c 'import sys; sys.stdout.write(repr(sys.stdin.read()))'",
$descriptorspec, $pipes);
fwrite($pipes[0], $input);
fclose($pipes[0]);
$chunk_size = 8192;
$escaped = fread($pipes[1], $chunk_size);
if (strlen($escaped) == $chunk_size) {
// This is important for security.
die("That string's too big.\n");
}
proc_close($process);
return $escaped;
}
// Example usage:
$x = "string \rfull \nof\t crappy stuff";
print py_escape($x);
The chunk_size
check is intended to prevent an attack whereby your input ends up being two really long strings, which look like ("hello " + ("." * chunk_size))
and '; os.system("do bad stuff")
respectively. Now, that naive attack won't work exactly, because Python won't let a single-quoted string end in the middle of a line, and those quotes in the system()
call will themselves be quoted, but if the attacker manages to get a line continuation ("\") into the right place and use something like os.system(map(chr, ...))
then they can inject some code that will run.
I opted to simply read one chunk and give up if there was more output, rather than continuing to read and accumulate, because there are also limits on Python source file line length; for all I know, that could be another attack vector. Python is not intended to be secure against arbitrary people writing arbitrary source code on your system so this area is unlikely to be audited.
The fact that I had to think of all this for this trivial example is just another example of why you shouldn't use python source code as a data interchange format.