views:

285

answers:

1

I'd need to parse partial SQL queries (it's for a SQL injection auditing tool). For example

'1' AND 1=1--

Should break down into tokens like

[0] => [SQL_STRING, '1']
[1] => [SQL_AND]
[2] => [SQL_INT, 1]
[3] => [SQL_AND]
[4] => [SQL_INT, 1]
[5] => [SQL_COMMENT]
[6] => [SQL_QUERY_END]

Are their any at least lexers for SQL that I base mine off of or any good tools like bison for C# (though I'd rather not write my own grammar as I need to support most if not all the grammar of MySQL 5)

+1  A: 

Seems that there's a few good parsers out there.

This SO article has a sample using MS's Entity Framework:
http://stackoverflow.com/questions/589096/parsing-sql-code-in-c

Seems someone else rolled their own and put it up on Code Project:
http://www.codeproject.com/KB/dotnet/SQL_parser.aspx

Personally, I'd go with the Entity Framework solution, since it was created and maintained by MS, but it also therefore probably is closely coupled with SQL Server. Since you're looking at MySQL, you may want to go with the custom solution on Code Project, as I'm sure you can then code in more custom solutions as the grammar requires.

I'll be using this soon (for Oracle, not MySQL), so please let the community know how the solution works out!

UPDATE:
I just came back to this and read the comments... upon further reflection, I'd really recommend ANTLR, since it supports multiple grammars. Once again, I haven't used it, so it'll be good to hear how it worked out, and it's up to you to decide.
http://stackoverflow.com/questions/76083/parsing-sql-in-net/76151

mattdekrey
Also, I Google'd "C# sql parser" for these answers.
mattdekrey