tags:

views:

71

answers:

3

If all queries end with ; I can explode by this char, but what should I do when a ; appears in a field?

e.g.

[...]Select * From my_data where idk=';';\nSelect [...]  

or

[...]Select * From my_data where idk=';\n';Select [...]

My file consists of all kinds of queries, including INSERTs and can have syntax variations like the ones shown above where a ; is followed by a new line sometimes inside a field.

How can deal with this problem?

PHP functions like explode will fail, would eregi or preg_match work?

+2  A: 

I'd suggest writing a very simple parser. The parser works like a state machine, and the state machine will operate on characters. Basically, the following state machine will eat characters until it finds a ; that isn't inside a field delimited by single quotes.

// no guarantees this is a fast or efficient one-liner
// PHP isn't the the greatest language for this sort of thing
$chars = str_split(implode("\n", file('filename.txt')));
$state = 0; // 0 = not in field, 1 = in field, 2 = in field, escaped char    
$query = "";
// loop over all characters in the file
foreach($c in $chars){
    // no matter what, append character to current query
    $query .= $c;

    // now for the state machine
    switch( $state ){
        case 0:
            if( $c == "'" ){
               $state = 1;
            }else if( $c == ";" ){
               // have a full query, do something with it
               // say, write $query to file
               // now reset $query
               $query = "";
            }
            break;
        case 1:
            if( $c == "'" ){
                // if the current character is an unescaped single quote
                // we have exited this field (so back to state 0)
                $state = 0;
            }else if( $c == "\\" ){
                // we found an backslash and so must temporarily
                // sit in a different state (avoids the sequence \')
                // and deals appropriately with \\'
                $state = 2;
            }
            break;
        case 2:
            // we can escape any char, to get here we were in a field
            // so to a field we must return
            $state = 1;
    }
}
Mark E
fopen => fread => and hmm 'foreach( char c in file ){' <C/C++ like ^^ ?> ?what is c <read start position in file ?> ) $last_c is last char yeah ?btw. ty for help
Programmer
@Programmer, now it's all PHP.
Mark E
A: 

Relibly and generally? You would have to write a parser that understands not just SQL in general, but the intricacies of MySQL's non-standard variation of SQL in particular. For example it would have to cope with:

  • single-quoted strings having \-escapes, so \' doesn't end the string (but \\' does), unless the NO_BACKSLASH_ESCAPES sql_mode option is used;
  • double-quoted strings being string literals, with similar escaping rules, unless ANSI_QUOTES is on;
  • backquoted schema names (what double quotes should do);
  • wonky comments like /* ... */ and #, and -- not starting a comment unless there is preceding whitespace

This is an enormous pain, which is why it's generally best to avoid having multiple SQL statements stuck together.

bobince
Why do you need a general parser? something simple can deal with entering and exiting fields, `;` isn't legal anywhere else in the syntax and so it delimits just fine as long as you can adequately determine when you're in a string and when you're not.
Mark E
Determining whether you're in a string literal (or a comment or schema name literal, where `;` can also live) is tricky, for the reasons listed above. Your parser example would need a bunch of additional states to cope with this, and if it were to be reliable for the general case you'd also need switches to determine the sql_mode and the charset (to avoid the infamous multibyte character set issues).
bobince
A: 

There aren't that many actual statements so what if you explode by \s*statement but you add all the statements?

example...

$queries = <<<END
select * from table where text=";";


insert into table(adsf,asdf,fff) values(null,'text;','adfasdf');
update table set this="adasdf;";

alter table add column;
select * from table where text=";";


insert into table(adsf,asdf,fff) values(null,'text;','adfasdf');
update table set this="adasdf;";

alter table add column;
END;

$statements = preg_split( '~;\s*(select.*?)|;\s*(insert.*?)|;\s*(update.*?)|;\s*(alter.*?)~i', $queries, null, PREG_SPLIT_DELIM_CAPTURE );

$good_statements[] = array_shift( $statements );

foreach( $statements as $statement ) {

    if ( $statement == '' ) continue;
    if ( !($i % 2) ) {
        $statement_action = $statement;
        echo $statement;
        $i++;
    }
    else {
        $good_statements[] = sprintf( "%s %s", trim( $statement_action ), trim( $statement ) );
        $i++;
    }

}


print_r($good_statements);
Galen
hmm space after END: and <<<END makes some problems ^^btw. This script is optimize ?
Programmer
trim() it beofre
Galen