ansaurus

Question

Regular expression to find all table names in a query

Answer 1

+1 A:

It's definitely not easy.

Consider subqueries.

select
  *
from
  A
  join (
    select
       top 5 *
    from
      B)
    on B.ID = A.ID
where
  A.ID in (
    select
      ID
    from
      C
    where C.DOB = A.DOB)

There are three tables used in this query.

Jason Lepack 2008-11-11 14:42:47

Answer 2

+1 A:

I think it would be easier to tokenize the string and look for SQL keywords that could bound the table names. You know the names will follow FROM, but they could be followed by WHERE, GROUP BY, HAVING, or no keyword at all if they're at the end of the query.

Bill the Lizard 2008-11-11 14:44:10

Answer 3

+11 A:

RegEx isn't very good at this, as it's a lot more complicated than it appears:

What if they use LEFT/RIGHT INNER/OUTER/CROSS/MERGE/NATURAL joins instead of the a,b syntax? The a,b syntax should be avoided anyway.
What about nested queries?
What if there is no table (selecting a constant)
What about line breaks and other whitespace formatting?
Alias names?

I could go on.

What you can do is look for an sql parser, and run your query through that.

Joel Coehoorn 2008-11-11 14:44:49

I think the real deal killer is going to be views. There is going to be no practical way to parse the underlying table names of any views included in the query.

JohnFx 2010-02-12 16:59:22

Answer 4

+1 A:

Everything said about the usefulness of such a regex in the SQL context. If you insist on a regex and your SQL statements always look like the one you showed (that means no subqueries, joins, and so on), you could use

FROM\s+([^ ,]+)(?:\s*,\s*([^ ,]+))*\s+

Stefan Gehrig 2008-11-11 14:53:15

Answer 5

A:

I found this site that has a GREAT parser!

http://www.sqlparser.com/

well worth it. Works a treat.

Jon 2008-11-11 15:36:28

Answer 6

+1 A:

Hi, I'm pretty late to the party however I thought I would share a regex I am currently using to analyse all our database objects and I disagree with the sentiment that it is not possible to do this using one.

The regex has a few assumptions

1) You are not using the A,B join syntax style

2) Whatever regex parser you are using supports ignore case.

3) You're analyzing, selects, joins, updates, deletes and truncates. It doesn't support the aforementioned MERGE/NATURAL because we don't use them, however I'm sure further support wouldn't be difficult to add.

I am keen to know what type of transaction the table is part of so I have included Named Capture groups to tell me.

Now I've not used regex for a long time so there are probably improvements that can be made however so far in all my testing this is accurate.

\bjoin\s+(?<Retrieve>[a-zA-Z\._\d]+)\b|\bfrom\s+(?<Retrieve>[a-zA-Z\._\d]+)\b|\bupdate\s+(?<Update>[a-zA-Z\._\d]+)\b|\binsert\s+(?:\binto\b)?\s+(?<Insert>[a-zA-Z\._\d]+)\b|\btruncate\s+table\s+(?<Delete>[a-zA-Z\._\d]+)\b|\bdelete\s+(?:\bfrom\b)?\s+(?<Delete>[a-zA-Z\._\d]+)\b

MrEdmundo 2010-02-12 16:53:03

Answer 7

A:

Constructing a regular expression is going to be the least of your problems. Depending on the flavor of SQL you expect to support with this code, the number of ways you can reference a table in a SQL statement is staggering.

PLUS, if the query includes a reference to a view or UDF, the information about what underlying tables won't even be in the string at all making it completely impractical to get that information by parsing it. Also, you'd need to be smart about detecting temporary tables and excluding them from your results.

If you must do this, a better approach would be to make use of the APIs to the particular database engine that the SQL was intended for. For example you could create a view based on the query and then use the DB Server api to detect dependencies for that view. The DB engine is going to be able to parse it much more reliably than you ever will without an enormous effort to reverse engineer the query engine.

If, by chance, you are working with SQL Server, here is an article about detecting dependencies on that platform: Finding Dependencies in SQL Server 2005

JohnFx 2010-02-12 17:05:21

Answer 8

A:

There is a similar problem for me but I am not using nested queries etc..

What I use is

Select a,b,c from table1 d,table2 e where

Select a,b,c from table1,table2 where

Select a,b,c from table1 where

(I can also replace a,b,c with *)

and simple forms of

insert into,replace into, delete from, update.

None of these below exists in my query

" * What if they use LEFT/RIGHT INNER/OUTER/CROSS/MERGE/NATURAL joins instead of the a,b syntax? The a,b syntax should be avoided anyway. * What about nested queries? * What if there is no table (selecting a constant) * What about line breaks and other whitespace formatting? * Alias names? "

faruk hakan 2010-05-13 14:47:42

you should really post this in its own question, and if wanted link to this this one. Please don't post multiple questions on a single thread.

Jon 2010-05-17 10:04:52

Answer 9

A:

This will pull out a table name on an insert Into query:

(?<=(INTO)\s)[^\s]*(?=\(())

The Following will do the same but with a select including joins

(?<=(from|join)\s)[^\s]*(?=\s(on|join|where))

Finally going back to an insert if you want to return just the values that are held in an insert query use the following Regex

(?i)(?<=VALUES[ ]*\().*(?=\))

I know this is an old thread but it may assist someone else looking around

Enjoy

Psymon25 2010-09-23 16:13:39

ansaurus

tags:

views:

answers:

Regular expression to find all table names in a query

related questions