tags:

views:

348

answers:

3

So I'm trying to build a script that automagically prepends valid column names with its appropriate table prefix (e.g. "t." or "r.")

$t_columns = array('id', 'name', 'label');
$r_columns = array('related_value');

INPUT:

id > 1 AND (name = 'Hello' OR label IN ('World', 'Planet name AND label')) AND (related_value > 1 AND related_value < 50)

OUTPUT:

t.id > 1 AND (t.name = 'Hello' OR t.label IN ('World', 'Planet name AND label')) AND (r.related_value > 1 AND r.related_value < 50)

Notice how you can't do a normal str_replace. What would be the simplest code (I'm guessing preg_replace) to ensure that all table names are properly prepended?

+1  A: 

After a few seconds' thinking, here's how I'd tackle it:

Walk through the string, char by char, looking for single quotes, but skipping over escaped characters. The stuff between two unescaped single quotes (i.e. the strings) would be replaced with an unique token, and put into an associative array, with that token as the key, and the original string as the value.

Now that we have the strings out of the way, do a str_replace() (or preg_replace(), if you insist) for the known column names. I'd probably construct the column names into an associative array, with the table's alias as the key, and the value as an enumerated array containing the column names. This way, the replacement could be automated.

Once the table names are filled in, just do a str_replace() for the tokens, to replace the original strings back to their places, and you're done.

I'm sure someone might whip an super-awesome (and probably next to unmaintainable) regexp to do all this in one hit. But I prefer to use regexps only in situations where regular expressions are actually the right tool, not where a CFL would be more suitable.

Henrik Paul
So basically, A) replace all text within single-quotes with some kind of random key, B) replace the table names using str_replace, C) replace the key with the original text...
Matt
This approach won't work if the SQL contains a comment with an unmatched and unescaped single quote character.
Pourquoi Litytestdata
Matt, that's an excellent briefing :). Pourquoi: Comments could be stripped altogether from the parsed query.
Henrik Paul
A: 

I don't know if a regular expression is a good idea here. I'd say it'd be worth the minimal amount of increased computational complexity to perform the validation yourself in PHP. Then, should your database require any changes, you won't have to pull your hair out worrying about how to increase the robustness of your regular expression.

As Jamie Zawinski once said, "Some people, when confronted with a problem, think 'I know, I'll use regular expressions.' Now they have two problems. "

In terms of creating a workflow that will make sure you're working on the right database, I'd consider taking an object-oriented approach. Classes for T and R tables can be instantiated children of a model class responsible for building the correct table-type object using conditional logic within the same method, rather than a regex.

Robert Elwell
Congratulations, your last paragraph won the "Most Technical Jargon" award. Judging from the first 2 paragraphs, I'll take it that you agree with wolfie...
Matt
"Make a thing that can make one thing or the other thing for you so you don't screw it up yourself". Better?
Robert Elwell
A: 

This can be done in a lot of ways, and also using regex. I'd personally use an array approach. First of all, I'd define the mangling table this way:

$table = array(
    'id' => 't.id',
    'name' => 't.name',
    'label' => 't.label',
    'related_value' => 'r.related_value'
);

This will make a lot easier the str_replace() call:

function mangling(&$v, $k, $table)
{
    if (($k & 1) == 0)
        $v = str_replace(array_keys($table), array_values($table), $v);
}

$spans = explode("'", ' ' . $input);
array_walk($spans, 'mangling', $table);
$output = implode("'", $spans);
ntd