tags:

views:

1011

answers:

2

I need a Regex Statement (run in c#) that will take a string containing a Sql Update statement as input, and will return a list of columns to be updated. It should be able to handle columns surrounded by brackets or not.

// Example Sql Statement
Update Employees
Set FirstName = 'Jim', [LastName] = 'Smith', CodeNum = codes.Num
From Employees as em
Join CodeNumbers as codes on codes.EmployeeID = em.EmployeeID

In the end I would want to return an IEnumerable or List containing:

  1. FirstName
  2. LastName
  3. CodeNum

Anyone have any good suggestions on implementation?

Update: The sql is user-generated, so I have to parse the Sql as it is given. The purpose of extracting the column names in my case is to validate that the user has permission to update the columns included in the query.

+3  A: 

You're doing it backwards. Store the data in a broken out form, with the table to be updated, the column names, and the expressions to generate the new values all separate. From this canonical representation, generate both the SQL (when you need it) and the list of columns being updated (when you need that instead).

If you absolutely must pull the column names out of a SQL statement, I don't think that regular expressions are the correct way to go. For example, in the general case you may need to skip over new value expressions that contain arbitrarily nested parenthesis. You will probably want a full SQL parser. The book Lex & Yacc by Levine, Mason, and Brown has a chapter on parsing SQL.

Response to update: You are in for a world of hurt. The only way to do what you want is to fully parse the SQL, because you also need to make sure that you don't have any subexpressions that perform unauthorized actions.

I very, very strongly recommend that you come up with another way to do whatever it is that you are doing. Maybe break out the modifiable fields into a separate table and use access controls? Maybe come up with another interface for them to use in specifying what they want done? Whatever it is that you're doing, there is almost certainly a better way to do it. Down that path there be dragons.

Glomek
I absolutely must pull the column names our of a sql statement (see update above).Why don't you think that Regex is the way to go? Aren't I just matching the left part of the comma-delimited "=" statements beginning after the word SET in the query?
Yaakov Ellis
My response was too long for a comment, so I put it in as an edit above.
Glomek
I am trying to prevent unauthorized actions in the ways described here: http://tinyurl.com/a4tvs2. Unfortunately, due to the project requirements, I need to have one textbox that will accept the user-generated queries. I am trying to work within those parameters as safely as possible.
Yaakov Ellis
You need to fully parse the SQL and walk the parse tree, verifying that every function and operator is allowed and that all row and column accesses are allowed. Is it possible that you could restrict this functionality to "administrative" users so that imperfect security would be acceptable?
Glomek
+2  A: 

Regular expressions cannot do this task, because SQL is not a regular language.

You can do this, but not with a regular expression. You need a full-blown parser.

You can use ANTLR to generate parsers in C#, and there are free grammars available for parsing SQL in ANTLR.

However, I agree with Glomek that allowing user-supplied SQL to be run against your system, even after you have tried to validate that it includes no "unauthorized actions," is foolish. There are too many cases that may circumvent your validation.

Instead, if you have only a single text field, you should define a simplified Domain-Specific Language that permits users to specify only actions that they are authorized to do. From this input, you can build the SQL yourself.

Bill Karwin
I am not attempting to parse and validate the entire query. I am merely trying to extract certain information from a certain type of query, that as far as I understand it would need to appear in a certain pattern.
Yaakov Ellis
Regular expressions are not the right tool to do this.
Bill Karwin