tags:

views:

42

answers:

4

I am attempting to split the ORDER BY statement of a SQL query into an array. First inclination is:

order_by.split(',')

but that does not work for order by statements like the following:

SUBSTRING('test',1,3) ASC, SUBSTRING('test2', 2,2 ) DESC

The desired output for the above statement would be:

["SUBSTRING('test',1,3) ASC", "SUBSTRING('test2', 2,2 ) DESC"]

I am fairly certain that it would work if I could match any comma that is not enclosed in parethesis, but I cannot find a way to do that in ruby regex because lookbehind is not supported.

A: 

While I suspect there probably is still a way of doing it with regex, you can do it with simple string parsing as well.

Find all commas in the string, then go forwards or backwards from that point (it won't matter which if the brackets are balanced correctly), adding one for an open brace and subtracting one for a closed brace. If you don't have 0 at the end, you're inside a brace, so it's not a comma you want.

Split on all other commas.

EDIT

Although the comment about this methodology failing in the case of parentheses enclosed in commas is valid, it may be the case that you're dealing with queries simple enough not to worry about that. If that is the case, this should work:

def in_brackets(str,pos)
  cnt = 0
  str[pos,str.length].each_char do |c|
    if c == '('
      cnt += 1
    elsif c == ')'
      cnt -= 1
    end
  end

  return cnt != 0
end

def split_on_some_commas(str) 
  offset = -1
  split_pts = []

  while (offset = str.index(",",offset+1))
    if !in_brackets(str,offset)
      split_pts << offset
    end
  end

  split_pts << str.length

  pos = 0

  ret = []
  split_pts.each do |pt|
    ret << str[pos..(pt-1)].strip
    pos = pt+1
  end

  return ret
end

puts split_on_some_commas("SUBSTRING('test',1,3) ASC, SUBSTRING('test2', 2,2 ) DESC, SUBSTRING('test2', 2,2 ) DESC").inspect
Jamie Wong
Well this does the job, I was hoping for something a bit more concise. But I believe this is the best way. Thanks!
Geoff Lanotte
+1  A: 

The easiest method would be to do a preg_replace_callback to replace the parentheses with a place holder, then explode the data, then loop through and put the parenthesis back.

Aaron Harun
A: 

In general, you can't use regular expressions to parse a non-regular language.

Try using Rockit to create a real grammar and parser for the subset of SQL you need.

Bill Karwin
A: 

You have the further problem that the strings could have parentheses in them, so you can't just match parentheses as suggested by Jamie Wong.

A straightforward solution would be to write a small state machine. Have a flag indicating whether you are inside of a string and a counter indicating the current parenthesis depth. Iterate over the string and for each character:

  • If it is a (non-escaped) quote mark, toggle the "inside of string" flag
  • If it is a left parenthesis and the "inside of string" flag is false, increment the nesting depth
  • If it is a right parenthesis and the "inside of string" flag is false, decrement the nesting depth
  • If it is a comma and the nesting depth is zero, you know that comma separates two "order by" clauses.

There may be an easier way to do this in Ruby; I'm not really sure because I don't use Ruby :-). I can say that this could be implemented in about eight lines or so of C++, so it's not a particularly onerous thing to implement.

(there may be bugs and other corner cases in the above algorithm description; it is untested)

James McNellis