views:

234

answers:

1

Are there any open source libraries (any language, python/PHP preferred) that will tokenize/parse an ANSI SQL string into its various components?

That is, if I had the following string

 SELECT a.foo, b.baz, a.bar
 FROM TABLE_A a
 LEFT JOIN TABLE_B b
 ON a.id = b.id
 WHERE baz = 'snafu';

I'd get back a data structure/object something like

 //fake PHPish 
 $results['select-columns']  = Array[a.foo,b.baz,a.bar];
 $results['tables']    = Array[TABLE_A,TABLE_B];
 $results['table-aliases'] = Array[a=>TABLE_A, b=>TABLE_B];
 //etc...

Restated, I'm looking for the code in a database package that teases the SQL command apart so that the engine knows what to do with it. Searching the internet turns up a lot of results on how to parse a string WITH SQL. That's not what I want.

I realize I could glop through an open source database's code to find what I want, but I was hoping for something a little more ready made, (although if you know where in the MySQL, PostgreSQL, SQLite source to look, feel free to pass it along)

Thanks!

+1  A: 

SQLite source has a file named parse.y that contains grammar for SQL. You can pass that file to lemon parser generator to generate C code that executes the grammar.

ardsrk