views:

42

answers:

1

Look at the following string:

SELECT
    column1 ,
    column2, column3
FROM
    table1
WHERE
    column1 = 'text, "FROM" \'from\\\' x' AND
    column2 = "sample text 'where' \"where\\\" " AND
    ( column3 = 5 )

I need to escape unnecessary white space characters from the string like:

  • removing white space from beginning and ending position of , ( ) etc
  • removing newline (\r\n) and tabs (\t)

But one thing. The remove process could not remove white spaces from the quoted string like:

  • 'text, "FROM" \'from\\' x'
  • "sample text 'where' \"where\\" "

etc.

i need to use the PHP function: preg_replace($pattern, $replacement, $string);

So what will be the value of $pattern and $replacement where the value of $string is the given SQL.

+1  A: 

A single regex-pattern and replacement string string will not work. Your first step could be to tokenize the input string: try first to match comments and string literals, and then try to match white space chars and lastly non-space chars. A quick demo:

$text = <<<BLOCK
SELECT
    column1 ,
    column2, column3
FROM
    table1
-- a comment line ' " ...
WHERE
    column1 = 'text, "FROM" \\'from\\\\\\' x' AND
    column2 = "sample text 'where' \\"where\\\\\\" " AND
    ( column3 = 5 )
BLOCK;

echo $text . "\n\n";

preg_match_all('/
    --[^\r\n]*                # a comment line
    |                         # OR
    \'(?:\\\\.|[^\'\\\\])*\'  # a single quoted string
    |                         # OR
    "(?:\\\\.|[^"\\\\])*"     # a double quoted string
    |                         # OR
    `[^`]*`                   # a string surrounded by backticks
    |                         # OR
    \s+                       # one or more space chars
    |                         # OR
    \S+                       # one or more non-space chars
/x', $text, $matches);

print_r($matches);

produces:

SELECT
    column1 ,
    column2, column3
FROM
    table1
-- a comment line ' " ...
WHERE
    column1 = 'text, "FROM" \'from\\\' x' AND
    column2 = "sample text 'where' \"where\\\" " AND
    ( column3 = 5 )

Array
(
    [0] => Array
        (
            [0] => SELECT
            [1] => 

            [2] => column1
            [3] =>  
            [4] => ,
            [5] => 

            [6] => column2,
            [7] =>  
            [8] => column3
            [9] => 

            [10] => FROM
            [11] => 

            [12] => table1
            [13] => 

            [14] => -- a comment line ' " ...
            [15] => 

            [16] => WHERE
            [17] => 

            [18] => column1
            [19] =>  
            [20] => =
            [21] =>  
            [22] => 'text, "FROM" \'from\\\' x'
            [23] =>  
            [24] => AND
            [25] => 

            [26] => column2
            [27] =>  
            [28] => =
            [29] =>  
            [30] => "sample text 'where' \"where\\\" "
            [31] =>  
            [32] => AND
            [33] => 

            [34] => (
            [35] =>  
            [36] => column3
            [37] =>  
            [38] => =
            [39] =>  
            [40] => 5
            [41] =>  
            [42] => )
        )

)

and then you can iterate over your tokenized $matches array and replace the space-matches where you see fit.

But as you might have read in my already deleted comment, a better option would be to use some dedicated SQL parser to perform this tokenizing: I am not fluent in SQL, but I am fairly sure my demo above can be easily broken.

Bart Kiers
great! thanks a lot... :)
Saiful
@Saiful, you're welcome.
Bart Kiers
@dear bart, if i want to include ```...``` type quotation in my list what extra pattern i've to add in the above list. note that there will be no escaping(\'\") within the ```...``` quote like the other quotes. i.e. 'sdfsa\'sadfsaf', "asfasf\"sdfsf"
Saiful
@Saiful, see my edit.
Bart Kiers
thanks a lot, you are great bro :)
Saiful