views:

287

answers:

3

Hi folks,

I'm using two regular expressions to pull assignments out of MySQL queries and using them to create an audit trail. One of them is the 'picky' one that requires quoted column names/etc., the other one does not.

Both of them are tested and parse the values out correctly. The issue I'm having is that with certain queries the 'picky' regexp is actually just causing Apache to segfault.

I tried a variety of things to determine this was the cause up to leaving the regexp in the code, and just modifying the conditional to ensure it wasn't run (to rule out some sort of compile-time issue or something). No issues. It's only when it runs the regexp against specific queries that it segfaults, and I can't find any obvious pattern to tell me why.

The code in question:

if ($picky)
    preg_match_all("/[`'\"]((?:[A-Z]|[a-z]|_|[0-9])+)[`'\"] *= *'((?:[^'\\\\]|\\\\.)*)'/", $sql, $matches);
else
    preg_match_all("/[`'\"]?((?:[A-Z]|[a-z]|_|[0-9])+)[`'\"]? *= *[`'\"]?([^`'\" ,]+)[`'\"]?/", $sql, $matches);

The only difference between the two is that the first one removes the question marks on the quotes to make them non-optional and removes the option of using different kinds of quotes on the value - only allows single quotes. Replacing the first regexp with the second (for testing purposes) and using the same data removes the issue - it is definitely something to do with the regexp.

The specific SQL that is causing me grief is available at:
http://stackoverflow.pastebin.com/m75c2a2a0

Interestingly enough, when I remove the highlighted section, it all works fine. Trying to submit the highlighted section by itself causes no error.

I'm pretty perplexed as to what's going on here. Can anyone offer any suggestions as to further debugging or a fix?

EDIT: Nothing terribly exciting, but for the sake of completeness here's the relevant log entry from Apache (/var/log/apache2/error.log - There's nothing in the site's error.log. Not even a mention of the request in the access log.)

[Thu Dec 10 10:08:03 2009] [notice] child pid 20835 exit signal Segmentation fault (11)

One of these for each request containing that query.

EDIT2: On the suggestion of Kuroki Kaze, I tried gibberish of the same length and got the same segfault. Sat and tried a bunch of different lengths and found the limit. 6035 characters works fine. 6036 segfaults.

EDIT3: Changing the values of pcre.backtrack_limit and pcre.recursion_limit in php.ini mitigated the problem somewhat. Apache no longer segfaults, but my regexp no longer matches all of the matches in the string. Apparently this is a long-known (from 2007) bug in PHP/PCRE:
http://bugs.php.net/bug.php?id=40909

EDIT4: I posted the code in the answers below that I used to replace this specific regular expression as the workarounds weren't acceptable for my purpose (product for sale, can't guarantee php.ini changes and the regexp only partially working removed functionality we require). Code I posted is released into the public domain with no warranty or support of any kind. I hope it can help someone else. :)

Thank you everyone for the help!

Adam

+4  A: 

Interestingly enough, when I remove the highlighted section, it all works fine. Trying to submit the highlighted section by itself causes no error.

What about size of the submission? If you pass gibberish of equal length, what will happen?

EDIT: splitting and merging will look something like this:

$strings = explode("\n", $sql);

$matches = array(array(), array(), array());

foreach ($strings AS $string) {
 preg_match_all("/[`'\"]?((?:[A-Z]|[a-z]|_|[0-9])+)[`'\"]? *= *[`'\"]?([^`'\" ,]+)[`'\"]?/", $string, $matches_temp);
 $matches[0] = array_merge($matches[0], $matches_temp[0]);
 $matches[1] = array_merge($matches[1], $matches_temp[1]);
 $matches[2] = array_merge($matches[2], $matches_temp[2]);
}
Kuroki Kaze
Also, isn't `preg` functions deprecated now?
Kuroki Kaze
no, thats ereg.
ryeguy
Good call. A string of 'X's of the same length causes the same error. I played around with it and found that a query length of exactly 6035 characters works fine. 6036 segfaults.
NuclearDog
Okay. Now you can split it by some token that certainly isn't match, get matches from splitted strings and merge them.
Kuroki Kaze
There aren't really any line breaks in the SQL, and seems to me that solution would still run into a problem with individual fields exceeding the length (which was the problem here) or containing new lines. Thank you, though :)
NuclearDog
+4  A: 

I have been hit with a similar preg_match-related issue, same Apache segfault. Only the preg_match that causes it is built-into the CMS I'm using (WordPress).

The "workaround" that was offered was to change these settings in php.ini:

[Pcre] ;PCRE library backtracking limit. ;pcre.backtrack_limit=100000 pcre.recursion_limit=200000000 pcre.backtrack_limit=100000000

The trade-off is for rendering larger pages, (in my case, > 200 rows; when one of the columns is limited to a 1500-character text description), you'll get pretty high CPU utilization, and I'm still seeing the segfaults. Just not as frequently.

My site's close to end-of-life, so I don't really have much need (or budget) to look for a real solution. But maybe this can mitigate the issue you're seeing.

NDP
Upping those values didn't mitigate the problem, but dropping them did. Unfortunately, the regexp no longer matches against the long field (page_content). Stopping the segfault is certainly a good temporary work around for me though, thank you :) Further searching turned up that this seems to be a long-known bug in PHP/PCRE: http://bugs.php.net/bug.php?id=40909
NuclearDog
+1  A: 

Given that this only needs to match against the queries when saving pages or performing other not very often-executed operations, I felt the performance hit of the following code was acceptable. It parses the SQL query ($sql) and places name=>value pairs into $data. Seems to be working well and handles large queries fine.

   $quoted = '';
   $escaped = false;

   $key = '';
   $value = '';
   $target = 'key';

   for ($i=0; $i<strlen($sql); $i++)
   {
    if ($escaped)
    {
     $$target .= $sql[$i];
     $escaped = false;
    }
    else if ($quoted!='')
    {
     if ($sql[$i]=='\\')
      $escaped = true;
     else if ($sql[$i]==$quoted)
      $quoted = '';
     else
      $$target .= $sql[$i];
    }
    else
    {
     if ($sql[$i]=='\'' || $sql[$i]=='`')
     {
      $quoted = $sql[$i];
      $$target = '';
     }
     else if ($sql[$i]=='=')
      $target = 'value';
     else if ($sql[$i]==',')
     {
      $target = 'key';
      $data[$key] = $value;
      $key = '';
      $value = '';
     }
    }
   }

   if ($value!='')
    $data[$key] = $value;

Thank you everyone for the help and direction!

NuclearDog