views:

68

answers:

5

I need a regex that will determine if a given SQL statement has a WHERE clause. My problem is that the passed SQL statements will most likely be complex, so I can not rely on just the existence of the word WHERE in the statement.

For example this should match

SELECT Contacts.ID
     , CASE WHEN (Contacts.Firstname IS NULL) THEN ''
            ELSE CAST(Contacts.Firstname AS varchar)
       END AS Firstname
     , CASE WHEN (Contacts.Lastname IS NULL) THEN ''
            ELSE CAST(Contacts.Lastname AS varchar)
       END AS Lastname
     , CASE WHEN (tbl_ContactExtras.Prequalified=-1 OR
                  tbl_ContactExtras.Prequalified IS NULL) THEN ''
            WHEN tbl_ContactExtras.Prequalified=0 THEN 'No'
            WHEN tbl_ContactExtras.Prequalified=1 THEN 'Yes - Other'
            WHEN tbl_ContactExtras.Prequalified=2 THEN 'Yes'
            ELSE CAST(tbl_ContactExtras.Prequalified AS varchar)
       END AS Prequalified
FROM contacts
LEFT JOIN tbl_ContactExtras
ON tbl_ContactExtras.ContactID = Contacts.ID
WHERE (Contacts.Firstname LIKE 'Bob%')

and this should not match:

SELECT Contacts.ID
     , CASE WHEN (Contacts.Firstname IS NULL) THEN ''
            ELSE CAST(Contacts.Firstname AS varchar)
       END AS Firstname
     , CASE WHEN (Contacts.Lastname IS NULL) THEN ''
            ELSE CAST(Contacts.Lastname AS varchar)
       END AS Lastname
     , CASE WHEN (tbl_ContactExtras.Prequalified=-1 OR
                  tbl_ContactExtras.Prequalified IS NULL) THEN ''
            WHEN tbl_ContactExtras.Prequalified=0 THEN 'No'
            WHEN tbl_ContactExtras.Prequalified=1 THEN 'Yes - Other'
            WHEN tbl_ContactExtras.Prequalified=2 THEN 'Yes'
            ELSE CAST(tbl_ContactExtras.Prequalified AS varchar)
       END AS Prequalified
FROM contacts
LEFT JOIN tbl_ContactExtras
ON tbl_ContactExtras.ContactID = Contacts.ID

Those are examples of some of the simpler statements: a statement could have up to 30 CASE statements in it, or it could have none at all.

I need to programmatically add WHERE parameters, but doing this correctly requires knowing whether a WHERE clause is already present.

Any idea on a regex that would work for this? If not, any other ideas on how to tell the two apart?

Thanks,

A: 

Are all of the joins the same? If so you could find the index of all or part of the FROM statement (perhaps using a regex to be tolerant of slight differences in syntax and whitespace) and then look for the occurrence of the word WHERE after that index.

In general you would be better off using a parser. But if this is just a one off thing and the statements are all fairly similar then the above approach should be okay.

Michael Petito
+2  A: 

This is not possible, since a WHERE clause may be arbitrarily nested inside the FROM clause.

SLaks
"Regexes can't parse".
Ira Baxter
i didnt want to have to go through a parser, but lokos like i'll have to. thanks.
Justin808
A: 

Regex is not designed to do this. Parsing SQL properly requires matching balanced parentheses (and other matching pairs, such as quotes), something regex is not designed to do (and pure regex isn't even equipped to; PCRE can but it's not pretty).

Instead, just write a basic state machine or something to parse it.

Amber
Escaped quotes can be matched by a regex. Other than that, however, you are correct.
SLaks
+1  A: 

This may not catch all cases but you may find you can catch most of them just by finding the last from and the last where in the statement.

if the where is after the from, then it has a where clause. If the where is before the from (or there is no where at all), then no where clause exists.

Sometimes, it's okay to leave restrictions or limitations in your code, as long as they're properly documented.

For example, I've worked on a project before that parsed SQL and we discovered it didn't handle things like between:

where recdate between '2010-01-01' and '2010-12-31'

Rather than spend a bucket-load of money fixing the problem (and probably introducing bugs on the way), we simply published it as a restriction and told everyone they had to change it to:

where recdate >= '2010-01-01'
  and recdate <= '2010-12-31'

Problem solved. While it's good to keep customers happy, you don't have to cater to every whim :-)

Other than that, you need an SQL parser, and SQL is not a pretty language to parse, trust me on that one.

paxdiablo
A: 

What's the problem you're trying to solve? Are you trying to determine if it's safe to add constraints to these existing queries?

For example, if you've got this query

...
where foo = 'bar'

then you know it's safe to add

and bat = 'quux'

but if you don't have a WHERE clause already, then you have to do it as

where bat = 'quux'

Is that the problem you're trying to solve? If so, can you make every SQL query you're working with have a WHERE clause by adding a "WHERE 0=0" to those queries that don't have one? Then you know in your post-process phase that every query already has one.

This is just a guess, of course. Your question sounded like that might be the larger issue.

Andy Lester