views:

289

answers:

4

I feel a little silly for asking this since I seem to be the only person in the world who doesn't get it, but here goes anyway. I'm going to use Python as an example. When I use raw SQL queries (I usually use ORMs) I use parameterisation, like this example using SQLite:

Method A:

username = "wayne"
query_params = (username)
cursor.execute("SELECT * FROM mytable WHERE user=?", query_params)

I know this works and I know this is the generally recommended way to do it. A SQL injection-vulnerable way to do the same thing would be something like this:

Method B:

username = "wayne"
cursor.execute("SELECT * FROM mytable WHERE user='%s'" % username)

As far I can tell I understand SQL injection, as explained in this Wikipedia article. My question is simply: How is method A really different to method B? Why is the end result of method A not the same as method B? I assume that the cursor.execute() method (part of Python's DB-API specification) takes care of correctly escaping and type-checking the input, but this is never explicitly stated anywhere. Is that all that parameterisation in this context is? To me, when we say "parameterisation", all that means is "string substitution", like %-formatting. Is that incorrect?

+13  A: 

A parameterized query doesn't actually do string replacement. If you use string substitution, then the SQL engine actually sees a query that looks like

SELECT * FROM mytable WHERE user='wayne'

If you use a ? parameter, then the SQL engine sees a query that looks like

SELECT * FROM mytable WHERE user=<some value>

Which means that before it even sees the string "wayne", it can fully parse the query and understand, generally, what the query does. It sticks "wayne" into its own representation of the query, not the SQL string that describes the query. Thus, SQL injection is impossible, since we've already passed the SQL stage of the process.

(The above is generalized, but it more or less conveys the idea.)

John Calsbeek
So if he had wayne;drop table accounts, what kind of error would it give? Just no results?
johnny
@johnny: It would find everything from `mytable` where `user` was `wayne;drop table accounts`. It'd pull back Little Bobby Tables's actual record.
Eric
@johny: right, no results. because that's a valid value, even if it's ugly. the binary-safe protocol between client and server doesn't care about quotes, semicolons or anything like that.
Javier
Thanks John, that's the explanation I was looking for! You gave me an "ooooooooooooh I seeeeeeeeeeeeee" moment where the paradigm shift occured :) I didn't realise that I shouldn't be thinking in terms of strings, but of what's happening further down the line.
Wayne Koorts
A: 

When you submit a query over SQL Server, it first checks the procedure cache. If it finds somequery EXACTLY equal, then he will use the same plan, and not recompile the query, just will replace the placeholders (variables) but in the server (db) side.

check the system table master.dbo.syscacheobjects, and do some tests so you learn a bit more over this topic.

Jhonny D. Cano -Leftware-
While this is SQL Server specific, most database engines do stuff like this. Not sure if SQLite (the mentioned engine) does or not, though.
John Calsbeek
This is a complete aside from the concept of understanding what parameterized queries do and why they provide a security advantage.
Cheekysoft
Sorry for any misunderstandings, that's how I initially start to understand why did I have to replace my string replacement querys, watching to this system table, and tracing whether my queries be harshing the procedure cache on the server.
Jhonny D. Cano -Leftware-
+1  A: 

when you do text replacement (like your method B), you have to be wary of quotes and such, because what the server will get is a single piece of text, and it have to determine where the value ends.

With parameterized statements, OTOH, the DB server gets the statement as is, without the parameter. The value is send to the server as a different pieces of data, using a simple binary safe protocol. Therefore, your program doesn't have to put quotes around the value, and of course it doesn't matter if there were already quotes in the value itself.

An analogy is about source and compiled code: in your method B, you're building the source code of a procedure, so you have to be sure to strictly follow the language syntax. With Method A, you first build and compile a procedure, then (immediately after, in your example), you call that procedure with your value as a parameter. And of course, in-memory values aren't subject to syntax limitations.

hum... that wasn't really an analogy, it's really what is happening under the hood (roughly).

Javier
+1  A: 

Using parameterized queries is a good way to punt the task for escaping and preventing injections to the DB client library. It will do the escape before it replaces the string with "?". This is done in the client library, before DB server.

If you have MySQL running, turn on SQL log, and try a few parameterized queries, and you will see that MySQL server is receiving fully substituted queries with no "?" in it, but the MySQL client library has already escaped any quotes in your "parameter" for you.

If you use method B with just string replacement, "s are not automatically escaped.

Synergetically, with MySQL, you can prepare a parameterized query ahead of time, and then use the prepared statement repeatedly later. When you prepare a query, MySQL parses it and gives you back a prepared statement -- some parsed representation MySQL understands. Each time you use the prepared statement, not only you are guarded against injection, but also you avoid the cost of parsing the query again.

And, if you really want to be secure, you can modify your DB access/ORM layer so that 1) web server code can only use prepared statements, and 2) you can only prepare statements before your web server starts. Then, even if your web app is hacked into (say via a buffer overrun exploit), the hacker can only still use the prepared statements, but nothing more. For this you need to jail your web app and only allow access to the database via your DB access/ORM layer.

OverClocked