views:

78

answers:

1

I have a string representing a SQL query, and I need to extract the names of the tables from that string. For example:

SELECT * FROM Customers

Would return "Customers". Or

SELECT * FROM Customers c, Addresses a WHERE c.CustomerName='foo'

SELECT a.AddressZip FROM Customers c
INNER JOIN Addresses a ON c.AddressId=a.AddressId

Would return "Customers, Addresses". Getting more advanced:

(SELECT B FROM (SELECT C FROM (SELECT Element AS C FROM MyTable)))

Would simply return "MyTable"

(Note, I may have typo'd the queries but you get the idea).

What would be the best/most accurate way of accomplishing this?

+2  A: 

Here's a way to do it, using a commercial utility (sqlparser.com $149, free trial)

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using gudusoft.gsqlparser;

namespace GeneralSqlParserTest
{
    class Program
    {
        static void Main(string[] args)
        {
            TGSqlParser sqlparser = new TGSqlParser(TDbVendor.DbVMssql);

            sqlparser.SqlText.Text = "SELECT * FROM Customers c, Addresses a WHERE c.CustomerName='foo'";
            sqlparser.OnTableToken += new TOnTableTokenEvent(OnTableToken);

            int result = sqlparser.Parse();
            Console.ReadLine();
        }

        static void OnTableToken(object o, gudusoft.gsqlparser.TSourceToken st, gudusoft.gsqlparser.TCustomSqlStatement stmt)
        {
            Console.WriteLine("Table: {0}", st.AsText);
        }
    }
}

Note that it counts 'c' and 'a' as tables, but it would be pretty simple to filter out single character names from your results

I do not use or own this tool, just something I found after some searching...

Brian Vander Plaats
It's a good start, although the fact that it picks up 'c' and 'a' are annoying as some are aliased with longer names.
esac