tags:

views:

1091

answers:

9

Intro note: I'm hoping a library or routine exists to do this, but I haven't been able to find anything like this. I'm really looking for direction and advice on where to start...

Here is the situation: I have a block of SQL commands coming as plain text. It might be one SQL commands or several. I need a way to split multiple SQL commands so I can run them one at a time. Microsoft SQL Management Studio does has this behavior out of box.

I'm trying to add this functionality to a PHP5/MySQL5 application running on Apache (Debian).

Some important points:

  1. I really do need to run them one at a time. Seriously.
  2. I don't want to require the user to enter a semi-colon after each SQL statement.
  3. SQL statements can be on one or multiple lines, so I can't wrap on LBs/CRs
  4. It needs to support SELECT, UPDATE, INSERT, DELETE at least.
  5. It needs to support queries that are sub-selects
  6. Neatly tabbed SQL needs to work
  7. (In the interest of usable software) I do not want to force the user to enter in any kind of delimiter.

Here is an example block of SQL I need to split into two statements:

select sMessage, 
(
    SELECT COUNT(sTag) FROM Tags WHERE ixTicket = note.ixTicket
) FROM note
select * from ticket
    WHERE (SELECT MAX(nCount) FROM Counter WHERE ixTicket = ticket.ixTicket) > 5

I tried some RegEx attempts, but that doesn't seem to be powerful enough.

Any recommendation on an approach to tackle this problem?

A: 

Your best bet is to require the user to put some type of deliminator between statements. For example: require each statement be delineated with a line containing only the word GO, or a "\", or end each statement with a ";".

This way you can easily break the single string into separate SQL statements.

Jess
Added point 7. Forgot to mention I definitely don't want delimiter.
Justin
A: 

Maybe try this library. I have used it successfully for parsing sql in the past. http://www.sqlparser.com/

Craig
I'll check this out. I need to do this in code, so I'm not sure that would work, but I'll look tomorrow.
Justin
A: 

If you don't want your users to put in a delimiting character such as ';' or any thing else, you will need to parse the input yourself and have logic to determine where statements begin.

Your logic will need to deal with the obvious query starting keywords 'SELECT', 'UPDATE', 'INSERT', 'DELETE' and work forward to the next keyword (or end of input).

Nick Josevski
I have recently just worked on a SQL Parser. Initially I thought like you did that it should be relatively straight forward but don't be fooled. Even with the help of a SQL Parser third party component I still had to write 600 lines of code to do some pretty simple parsing.
Craig
Yah, I know I can write my own routine to do this. But it's hideous every time I try and fails.
Justin
A: 

Have you tried using the keywords 'SELECT', 'UPDATE', 'INSERT' and 'DELETE' combined with counting the number of opening '(' and closing braces ')' ?

This should allow you to determine avoid nested SELECT statements and find the correct end of the statement.

Craig T
Yes, I did. The code got hideous and long, and I constantly found use cases that broke it, so I figured there must be a more flexible way.
Justin
A: 

You need to require the semicolon delimiter. Technically, without it a SQL statement is completely invalid; anyone omitting it is writing malformed SQL. Requiring the semicolon solves all of your problems, in a standardized way, and makes the software easy to write.

Perhaps do the following: if the user enters a query not containing one or more semicolons (outside of quotes, of course), add a semicolon at the end and run it as a single query. Otherwise, split the entered queries at semicolons and run each one individually, perhaps tacking on a semicolon at the end of the final query if omitted.

This solution is easy to write, SQL standard compliant, and just plain works. Not requiring the delimiter is a sure path to madness.

kquinn
"Not requiring the delimiter is a sure path to madness." I totally disagree. That's one of the best features in MS SQL Management Studio.
Justin
If LuckyLindy's comment above is to be believed, even SQL Management Studio uses the approach I describe. Not requiring delimiters will require you to write a full SQL parser, as complex as the server's itself. Don't do it. 'Saving' users the 'trouble' of semicolons will only hurt in the long run.
kquinn
@kquinn - SQL Server Management Studio does _not_ require a delimiter.
Justin
You're still crazy for demanding this. The alternative is simple and easy. Your demand for no delimiters is difficult, error-prone, and fragile.
kquinn
@kquinn. Respectfully, I disagree. SQL Server Management Studio does not require delimiters and it is not an error-prone or fragile application. The idea is to design software for how people will tend to behave, not how I want them to behave. Look, I'm not saying this is easy, and I'm certainly not going to try to write my own SQL parser in PHP, but it's still worth trying to find a way.
Justin
The fact that this question is now one year old and not yet solved should tell you something: namely, your demand is inane. Just require a semicolon or a double-newline or something, and be done with it.
kquinn
A: 

You could parse it yourself I suppose. Look for the keywords SELECT, DELETE, UPDATE, INSERT, EXEC, etc.

As you parse, if you encounter a "(" increment a counter: nest_level++

If you encounter a ")" decrement nest_level--

Then when you come across a keyword, and nest_level == 0, then you've come to the next statement.

You'll also have to handle cases like

 INSERT ...
 SELECT ....

So for an INSERT you would have to look for either SELECT or VALUES...

And no doubt other cases.

Agree with kquinn you should just require the semicolon. I don't think there's anything "uncool" about that.

MikeW
Yah, these are all the traps that I caught in trying to write my own algorithm.
Justin
A: 

I'm not sure this is possible at all. You would certainly need an in-depth knowledge of the SQL syntax of your target DBMS. For example just off the top of my head this is a single MySQL statement:

INSERT INTO things
SELECT * FROM otherthings ON DUPLICATE KEY
UPDATE thingness=thingness+1

It is likely there are constructs in some DBMSs that, without a delimiter, could be ambiguous.

I don't want to require the user to enter a semi-colon after each SQL statement.

I think you may be forced to. It's totally the standard way to delimit SQL statements. Even if you can find a heuristic to detect probably-start-of-SQL-statement points, you risk disasters like an accidental “DELETE FROM things”-without-WHERE-clause.

SQL statements can be on one or multiple lines, so I can't wrap on LBs/CRs

Would double-newline-for-new-statement be acceptable?

I tried some RegEx attempts, but that doesn't seem to be powerful enough.

No, even with semicolon delimiters, regex is nowhere near powerful enough to parse SQL. Problem points would include:

';'
";"
`;`
'\';'
''';'
-- ;
#;
/*;*/

and any interposition of these structures. Eek!

bobince
All good points, but I don't want to require a delimiter. It's very possible and safe to parse it without, just look @ SQL Management Studio. I didn't say it was going to be easy.
Justin
+1  A: 

To add a quirk to the discussion that periodically causes issues:

DECLARE c CURSOR FOR
    SELECT * FROM SomeWhere ...
        FOR UPDATE

The trailing UPDATE tends to throw ad hoc parsers off their stride. It may well be that you don't have to worry about that because the DECLARE notation (which is really Embedded SQL, not plain SQL) is not permitted in the first place. But the FOR UPDATE clause can appear in some dialects of SQL even when not in a DECLARE statement, so beware.

Jonathan Leffler
+1  A: 

maybe with the following Java Regexp? check the test...

@Test
public void testRegexp() {
    String s = //
        "SELECT 'hello;world' \n" + //
        "FROM DUAL; \n" + //
        "\n" + //
        "SELECT 'hello;world' \n" + //
        "FROM DUAL; \n" + //
        "\n";

    String regexp = "([^;]*?('.*?')?)*?;\\s*";

    assertEquals("<statement><statement>", s.replaceAll(regexp, "<statement>"));
}
mhoms