views:

279

answers:

1

I am trying to write a simple sql parser to break a sql statement into its basic parts. However, I'm having a problem with nested queries. An example illustrates best:

sql = "select * from Customers where id in (select customer_id from Orders where 1=1)"
Set re = New RegExp
re.IgnoreCase = True
re.Pattern = "^(.*)\swhere\s(.*)$"
re.Global = True
Set matches = re.Execute( sql )


If matches.count > 0 Then
 Set submatches = matches(0).Submatches
 where_part = Trim(submatches(1))
 everything_else = Trim(submatches(0)) 
End If 

Response.Write where_part & "<br>"
Response.Write everything_else & "<br>"

Here, I want the where_part to contain "id in (select customer_id from Orders where 1=1)" and everything_else to contain "select * from Customers". That is, I want it to be as greedy as possible, for the second (.*) and selfless for the first. Or put another way, I want the everything_else to contain no where's.

However, I get everything_else = "select * from Customers where id in (select customer_id from Orders" and where_part = "1=1)"

How can I remedy this without resorting to messy non-regexp string parsing?

+4  A: 

On the contrary, your regexp is too greedy.

.* catches the longest possible matching string, which is until the second where in your example.

Try this instead:

^(.*?)\swhere\s(.*)

The ? makes .* to behave less greedily, meaning it will stop at the first where occurrence.

Vinko Vrsalovic
works perfectly!
Tristan Havelick