views:

1682

answers:

6

For my senior thesis, I developed a program that would automatically detect and suggest fixes to SQL injection vulnerabilities using prepared statements. Specifically the mysqli extension for PHP. My question for the SO community is this: What would your preferred approach be to detect the SQL in PHP source code?

I used an enum containg the SQL keywords (SELECT, INSERT, ...) and basically parsed each line, iterating over the enum to determine if any SQL was present. Additionally, I had to make sure that the parser was not erroneously detecting html (for example <\select>).

For me this solution worked fine, but now I have a little more time on my hands now and have thought about refactoring the code to use a more elegant (and efficient) solution. Please limit your solutions to using C# as that is what I wrote my program in.

+1  A: 

Your solution seems fine to me. The other way would be to parse the PHP file with a Lex/Yacc parser using the grammar for PHP, there is one good C# grammar parsing tool called Coco/R http://www.ssw.uni-linz.ac.at/coco/.

However I believe if you do parse the language, you will end up consuming too much time (in development and in computing) for no additional results.

I would stick with your opportunistic approach, but test it against various PHP code and tweak it to cover all possible cases.

Vincent
+1  A: 

Maybe theres some milage in parsing text lines against the BNF for, say, SQL92, and scoring each line on how closely the fragments match the grammar.

Sounds like some heavy lifting though. Your simple approach will catch such a large percentage of real-world cases already.

Cheekysoft
+1  A: 

I do not know the specifics of variables in C# so you will have to forgive or down-vote me for using PHP but 70% of the time my SQL query goes into a variable like so

$sql = "SELECT * FROM table;";

Beyond that I am unable to think of anything you can do to improve on what you already have.

Do you take into account statements that are created over several lines and use variables within the string? (Example below)

$sql = "SELECT * FROM table WHERE fname = $fname OR snmae = $sname";
Teifion
A: 

I do not know the specifics of variables in C# so you will have to forgive or down-vote me for using PHP but 70% of the time my SQL query goes into a variable like so ..

Yeah, my original approach was to just look for the $sql vars since that is what most people use, but after testing against a few PHP apps I quickly threw that solution out because some developers use some funky variable names ...

Do you take into account statements that are created over several lines and use variables within the string? (Example below)

Yep. I also attempted to handle statements that were generated conditionally, but that didn't always work so well. ;)

Magic Hat
A: 

A simple regex to detect all CRUD sql statements used with functions (assuming $script contains the whole php script)

preg_match_all('/\(\s*?"(?:SELECT|INSERT|UPDATE|DELETE) .*?"\s*?\)\s*?;/is', 
               $script, $matches);

It should match all possible SELECT, INSERT, UPDATE, DELETE statements, if they're placed within parentheses and double quotes. It's case insensetive and should match statements that span across multiple lines too.

edit #1: Regex for matching CRUD statement like string assignments;

preg_match_all('/\$\w+\s*?=\s*?"(?:SELECT|INSERT|UPDATE|DELETE) .*?"\s*?;/is', 
               $script, $matches);

edit #2:

// $variable detecting version of #1 regex
preg_match_all('/\(\s*?"(?:SELECT|INSERT|UPDATE|DELETE) .*?(?:\$\w+){1}.*?"\s*?\)\s*?;/is', 
                   $script, $matches);
Imran
+1  A: 

I would say it would be best to look for function calls instead of looking for SQL itself. Possibly modify the PHP parser to look for function calls that result in running an SQL query which is not a prepared query.

Kibbee