tags:

views:

6189

answers:

8

For a web application I want to build a WHERE clause AND submit it to the server. There I will append it to a query. the clause will be something like

LASTNAME LIKE 'Pep%' AND (DOB BETWEEN '19600101' AND '19601231 OR SALARY<35000)

Can you propose a regular expression to validate the clause before submitting it to SQL Server?

(Yes, of course I would like a regex for the ORDER clause)

+12  A: 

This is a monumentally bad idea. I would suggest you build a filter system instead where the user can select all kinds of options in a form, and then you build the correct sql server-side instead of opening yourself up to all kinds of injection attacks.

As an example of what might go wrong, consider this:

LASTNAME LIKE 'Pep%'--
DROP TABLE People
--

This will inject a DROP TABLE command into your SQL, which will be hard to detect. You can of course remove things like -- and /*, but I guarantee that someone can find a way in if you do this.

Lasse V. Karlsen
This is exactly what I do, typically using table based functions on SQL server to make it more efficient.
tvanfosson
Good answer -- we must constantly dissuade programmers from creating more SQL injection flaws!
Bill Karwin
A: 

Yes I know.That is exactly what I am trying to avoid.

I will come to build a generic WHERE class and a generic ORDERBY class before coming out of beta.

Till then (and after then) I would like to have a server side validation before append to my SELECT QUERY.

At the moment the user does not compose the WHERE query, but selects options from a form.

The clause is composed client side.

I validate the referer property of the request and must come from a logged in user.

pkario
It doesn't matter if the user doesn't compose the query. They can still inject statements if you're build the query dynamically rather than use parameters.
Joel Coehoorn
@Joel Coehoorn is right!
Bill Karwin
A: 

The number of possibilities of elements in a where clause is immense. Obviously you've got your ANDs and ORs and BETWEENs and IN lists and other operators, plus parens, but you could also be calling system procedures, user-defined functions, and depending on the RDBMS you're working with, entire subqueries. Then there are queries that may be syntactically correct but are still illegal.

A single regular expression to cover this would (a) be insanely big and (b) probably not cover all the cases. Not only do you really not want to do this, but it's likely not even possible.

Graeme Perrow
A: 

As has already been suggested, a regex is the wrong tool for the job, what you really need is an SQL parser. I don't know of any .Net SQL parsers, but I'm sure a Google search will turn up a few.

Don
+4  A: 

You don't build

LASTNAME LIKE 'Pep%' AND (DOB BETWEEN '19600101' AND '19601231 OR SALARY<35000)

you build

LASTNAME LIKE @LastName AND (DOB BETWEEN @dobStart AND @dobEnd OR SALARY<@MaxSalary)

and pass in those guys as parameters. No Regex, no fuss.

Jimmy
A: 

You might want to take a look at Subsonic. Its designed to generate a data layer for you and let you use objects to build your where clauses.

Slapout
+2  A: 

I want to expand on Jimmy's answer a bit.

LASTNAME LIKE 'Pep%'

That's just EVIL. NEVER do it. The SQL string should look like this instead:

LASTNAME LIKE @LastName + '%'

Now the problem is that in your case you don't know if you need to do a lastname check at all. All you have are SELECT and FROM clauses and a textbox for the lastname column that may or may not have a value in it. Fine. That's still no excuse for doing it like in the first example. What you need to do instead is build your query like this (using C# for now since you didn't supply a client langauge):

//create a place to keep parameters until we can construct the SqlCommand object
List<SqlParameter> params = new List<SqlParameter>();
SqlParameter p;

// the StringBuilder is MUCH more efficient the concatenating strings
// the 1=1 is a placeholder so you can always just append " AND whatever"
StringBuilder sql = new StringBuilder("SELECT ... \nFROM .... \nWHERE 1=1\n");

// Check and add a parameter for the LastName column if needed
if (!String.IsNullOrEmpty(txtLastName.Text))
{
   sql.AppendLine("AND LASTNAME LIKE @LastName + '%'");
   p = new SqlParameter("@LastName", SqlDbType.VarChar, 50);  // use the actual datatype here
   p.Value = txtLastName.Text;
   params.Add(p);      
}

// Check and add a parameter for another field if needed
if (!String.IsNullOrEmpty(txtSomeOtherField.Text))
{
   sql.AppendLine("AND OtherField LIKE @OtherParam + '%'");
   p = new SqlParameter("@OtherParam", SqlDbType.VarChar, 255);
   p.Value = txtSomeOtherField.Text;
   params.Add(p);
}

// ...  You could also write a method to abstract the code in the if blocks ...

// you haven't told us _how_ the user will specify the order, so I'm leaving that implementation detail out for now
sql.Append(" ORDER BY LastName, OtherField");  

// now we can finally get our SQL String and build the (SAFE!) SqlCommand object:
SqlCommand cmd = new SqlCommand(sql.ToString(), YourSqlConnectionObjectHere);
cmd.Parameters.AddRange(params.ToArray());

Now you have a dynamically generated where clause with no possibility for injection. It works because every part of the string sent to the database is an exact literal in your code, even if those literals are assembled over a number of steps. The values used in the parameters are never substituted into the string, but instead sent to the server separately as data.

Of course this was C# (.Net), but just about every modern platform has some form of parameterized query/prepared statement feature you should be using.

Joel Coehoorn
+2  A: 

Several other commenters have addressed the point that allowing user input to determine SQL syntax is a bad idea and created SQL injection vulnerabilities.

Please take note that the commenters are unanimous in this opinion and have a total StackOverflow "reputation" exceeding 14,000!

But leaving aside that point, you asked how to write a regular expression to match arbitrary SQL syntax. The answer is that you can't do that with regular expressions.

"Regular" in this context means the expression can match a certain class of input language, which can also be represented by a deterministic finite state-machine (sort of like a flow-chart).

For example, one simple thing you'd need to match an SQL expression is the ability to match nested parentheses. You can't design a finite state-machine to match nested parentheses, because it would require the ability to count how many levels deep in nested parentheses you are at any given point. For that you need the parser to implement a stack, but this is something a regular expression can't do.

Bill Karwin
With your response, we're now much closer to 18,000 (and rising).
Joel Coehoorn
In his defense, he only wants the expression to match one portion of a query. However, it's the portion that's most likely to require your nested parentheses example.
Joel Coehoorn