tags:

views:

1095

answers:

4

currently I have the following code:

String select = qry.substring("select ".length(),qry2.indexOf(" from "));
String[] attrs = select.split(",");

which works for the most parts but fails if given the following:

qry = "select a,b,c,DATETOSTRING(date_attr_name,'mm/dd/yyyy') from tbl_a";

what I'm looking for is the regex to feed to String.split() which will hande that situation, and for that matter any other special cases you might be able to think of that I'm missing.

+1  A: 
[^,]+\([^\)]+\)|[^,]+,

Should do it nicely provided you always add a final ',' to your select string:

a,b,c,DATETOSTRING(date_attr_name,'mm/dd/yyyy'),f,gg,dr(tt,t,),fff

would fail to split the last 'fff' attributes, but:

a,b,c,DATETOSTRING(date_attr_name,'mm/dd/yyyy'),f,gg,dr(tt,t,),fff,

would captures it. So a little pre-processing would smooth things out.

Caveat: this does not take into account expression within expression

EXP(arg1, EXP2(ARG11,ARG22), ARG2)

Tell me if that can happen in the queries you have to process.

Caveat bis: since this needs a true regexp and not a simple separator expected by split(), you must use a Matcher, based on the pattern [^,]+\([^\)]+\)|[^,]),, and iterate on Matcher.find() to fill the array of attributes attrs.

In short, with split() function, there is no single simple separator that might do the trick.

VonC
I don't think your regex can be fed to String#split to get the results asked for. String#split expects the "separator" regex, not the regex that matches the entire string.
bdumitriu
+2  A: 

Your answer in the form of a quote:

Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. — Jamie Zawinski

Your regex would have to take into account all possible functions, nested functions, nested strings, etc. Your solution probably isn't a regex, it is a lexer+parser.

John Gardner
I started to downvote this answer because of the worn-out "now you have two problems" quote, but unfortunately the meat of the answer is correct: regex is the wrong tool for this job. Can we please give the quote a rest now?
Alan Moore
A regex is the wrong tool for the job if it needs to handle every valid SQL statement. It's the right tool for the job if only a specific, limited set of SQL statements needs to be split. The author of the question should be more specific about what he's really trying to do.
Jan Goyvaerts
The SQL query may be simple enough to match with regexes right now, but what's to stop it from becoming more complex in the future? It's already pushing the envelope.
Alan Moore
+1  A: 

You probably would have better luck with a SQL parser.

Bill the Lizard
+1  A: 

As others have mentioned this is actually a lexer and parser problem which is much more complicated then just a string split or regex. You will also find that depending on what version of SQL you are using and what database with throw all sorts of cogs into your parser given the myriad of variations you could end up with in your SQL. The last thing you want to do is have maintaining this piece of code your full time job as you find additional edge cases that break.

I would ask yourself the following questions

  1. What are you trying to accomplish by this tokenizing? What problem are you trying to solve? There might be a simple solution that doesn't require parsing the statement.

  2. Do you need all the SQL or just the target columns/projection list?

Scanningcrew