views:

310

answers:

7

I've the following method which allows me to protect MySQL entities:

public function Tick($string)
{
    $string = explode('.', str_replace('`', '', $string));

    foreach ($string as $key => $value)
    {
     if ($value != '*')
     {
      $string[$key] = '`' . trim($value) . '`';
     }
    }

    return implode('.', $string);
}

This works fairly well for the use that I make of it.

It protects database, table, field names and even the * operator, however now I also want it to protect function calls, ie:

AVG(database.employees.salary)

Should become:

AVG(`database`.`employees`.`salary`) and not `AVG(database`.`employees`.`salary)`

How should I go about this? Should I use regular expressions?

Also, how can I support more advanced stuff, from:

MAX(AVG(database.table.field1), MAX(database.table.field2))

To:

MAX(AVG(`database`.`table`.`field1`), MAX(`database`.`table`.`field2`))

Please keep in mind that I want to keep this method as simple/fast as possible, since it pretty much iterates over all the entity names in my database.

A: 

You could use preg_replace_callback() in conjunction with your Tick() method to skip at least one level of parens:

public function tick($str) 
{
  return preg_replace_callback('/[^()]*/', array($this, '_tick_replace_callback'), $str);
}

protected function _tick_replace_callback($str) {
  $string = explode('.', str_replace('`', '', $string));
  foreach ($string as $key => $value)
  {
    if ($value != '*')
    {
      $string[$key] = '`' . trim($value) . '`';
    }
  }
  return implode('.', $string);
}
pix0r
Returns "\`Array\`\`Array\`)\`Array\`". =\
Alix Axel
+4  A: 

If this is quoting parts of an SQL statement, and they have only complexity that you descibe, a RegEx is a great approach. On the other hand, if you need to do this to full SQL statements, or simply more complicated components of statements (such as "MAX(AVG(val),MAX(val2))"), you will need to tokenize or parse the string and have a more sophisticated understanding of it to do this quoting accurately.

Given the regular expression approach, you may find it easier to break the function name out as one step, and then use your current code to quote the database/table/column names. This can be done in one RE, but it will be tricker to get right.

Either way, I'd highly recommend writing a few unit test cases. In fact, this is an ideal situation for this approach: it's easy to write the tests, you have some existing cases that work (which you don't want to break), and you have just one more case to add.

Your test can start as simply as:

assert '`ticked`' == Tick('ticked');
assert '`table`.`ticked`' == Tick('table.ticked');
assert 'db`.`table`.`ticked`' == Tick('db.table.ticked');

And then add:

assert 'FN(`ticked`)' == Tick('FN(ticked)');
etc.
ndp
+2  A: 

Before you explode your string on periods, check if the last character is a parenthesis. If so, this call is a function.

<?php
$string = str_replace('`', '', $string)
$function = "";
if (substr($string,-1) == ")") {
  // Strip off function call first
  $opening = strpos($string, "(");
  $function = substr($string, 0, $opening+1);
  $string = substr($string, $opening+1, -1);
}

// Do your existing parsing to $string

if ($function == "") {
  // Put function back on string
  $string = $function . $string . ")";
}
?>

If you need to cover more advanced situations, like using nested functions, or multiple functions in sequence in one "$string" variable, this would become a much more advanced function, and you'd best ask yourself why these elements aren't being properly ticked in the first place, and not need any further parsing.

EDIT: Updating for nested functions, as per original post edit To have the above function deal with multiple nested functions, you likely need something that will 'unwrap' your nested functions. I haven't tested this, but the following function might get you on the right track.

<?php
function unwrap($str) {
  $pos = strpos($str, "(");
  if ($pos === false) return $str; // There's no function call here

  $last_close = 0;
  $cur_offset = 0; // Start at the beginning
  while ($cur_offset <= strlen($str)) {
    $first_close = strpos($str, ")", $offset); // Find first deep function
    $pos = strrpos($str, "(", $first_close-1); // Find associated opening
    if ($pos > $last_close) {
      // This function is entirely after the previous function
      $ticked = Tick(substr($str, $pos+1, $first_close-$pos)); // Tick the string inside
      $str = substr($str, 0, $pos)."{".$ticked."}".substr($str,$first_close); // Replace parenthesis by curly braces temporarily
      $first_close += strlen($ticked)-($first_close-$pos); // Shift parenthesis location due to new ticks being added
    } else {
      // This function wraps other functions; don't tick it
      $str = substr($str, 0, $pos)."{".substr($str,$pos+1, $first_close-$pos)."}".substr($str,$first_close);
    }
    $last_close = $first_close;
    $offset = $first_close+1;
  }
  // Replace the curly braces with parenthesis again
  $str = str_replace(array("{","}"), array("(",")"), $str);
}
MidnightLightning
+1 This is the solution I thought of ...
too much php
+2  A: 

It's generally a bad idea to pass the whole SQL to the function. That way, you'll always find a case when it doesn't work, unless you fully parse the SQL syntax.

Put the ticks to the names on some previous abstraction level, which makes up the SQL.

Ondra Žižka
I'm not passing the whole SQL, normally just the field name, however I need to detect when a field name also contains a function. I have no other way around this, believe me.
Alix Axel
And will it always be a function? What about expressions?CASE WHEN `db`.`tab`.`col` BETWEEN `db`.`othertab`.`col2` THEN 10 ELSE 20
Ondra Žižka
A custom ORM is indeed a formidable opponent.
too much php
A: 

Are you generating the SQL Query or is it being passed to you? If you generating the query I wouldn't pass the whole query string just the parms/values you want to wrap in the backticks or what ever else you need.

EXAMPLE:

   function addTick($var) {
      return '`' . $var . '`';
   }

   $condition = addTick($condition);

   $SQL = 'SELECT' . $what . ' 
      FROM ' . $table . ' 
      WHERE ' . $condition . ' = ' . $constraint;

This is just a mock but you get the idea that you can pass or loop through your code and build the query string rather than parsing the query string and adding your backticks.

Phill Pafford
I'm generating the SQL, but not in the way your example is, the tick function normally just handles column names however I also need it to handle functions that act on column names.
Alix Axel
understandable but if your generating the query this should be no problem.
Phill Pafford
I'm using something like http://codeigniter.com/user_guide/database/active_record.html to generate the SQL.
Alix Axel
+1  A: 

If you are adding the function calls in your code, as opposed to passing them in through a string-only interface, you can replace the string parsing with type checking:

function Tick($value) {
    if (is_object($value)) {
        $result = $value->value;
    } else {
        $result = '`'.str_replace(array('`', '.'), array('', '`.`'), $value).'`';
    }

    return $result;
}

class SqlFunction {
    var $value;
    function SqlFunction($function, $params) {
        $sane = implode(', ', array_map('Tick', $params));
        $this->value = "$function($sane)";
    }
}

function Maximum($column) {
    return new SqlFunction('MAX', array($column));
}

function Avg($column) {
    return new SqlFunction('AVG', array($column));
}

function Greatest() {
    $params = func_get_args();
    return new SqlFunction('GREATEST', $params);
}

$cases = array(
    "'simple'" => Tick('simple'),
    "'table.field'" => Tick('table.field'),
    "'table.*'" => Tick('table.*'),
    "'evil`hack'" => Tick('evil`hack'),
    "Avg('database.table.field')" => Tick(Avg('database.table.field')),
    "Greatest(Avg('table.field1'), Maximum('table.field2'))" => Tick(Greatest(Avg('table.field1'), Maximum('table.field2'))),
);

echo "<table>";
foreach ($cases as $case => $result) {
    echo "<tr><td>$case</td><td>$result</td></tr>";
}

echo "</table>";

This avoids any possible SQL injection while remaining legible to future readers of your code.

eswald
+2  A: 

Using the test case ndp gave I created a regex to do the hard work for you. The following regex will replace all word boundaries around words that are not followed by an opening parenthesis.

\b(\w+)\b(?!\()

The Tick() functionality would then be implemented in PHP as follows:

function Tick($string) 
{
    return preg_replace( '/\b(\w+)\b(?!\()/', '`\1`', $string );
}
Huppie
MAX(AVG(database.table.field1), MAX(database.table.*)) comes up as MAX(AVG(`database`.`table`.`field1`), MAX(`database`.`table`.*)) just what I needed, thanks!
Alix Axel