tags:

views:

511

answers:

2

I know this a common question but everything I found seems to remove white space.

I'm looking for a regular expression that will strip unprintable characters WITHOUT changing any whitespace. This a function that all user input will be filtered through, which means all the characters you could normally type on a keyboard are valid. Ex: the accents you see in Spanish are valid. Basically anything you could display using the UTF 8 charset.

Because this is SQL Server, I don't think the "SET NAMES UTF8" approach will work.

Here's what I have.

function stripNonPrintable($input) 
{
    return preg_replace('/[\x00\x08\x0B\x0C\x0E-\x1F]/', '', $input);
}
A: 

You could always escape the whitespace first:

 function stripNonPrintable($input) 
 {
  $input = preg_replace('/ /','%%%%SPACE%%%%%%', $input);
  $input = preg_replace('/\t/','%%%%TAB%%%%%%', $input);
  $input = preg_replace('/\n/','%%%%NEWLINE%%%%%%', $input);

  $input = preg_replace('/[\x00\x08\x0B\x0C\x0E-\x1F]/', '', $input);

  $input = str_replace('%%%%SPACE%%%%%%',    ' ', $input);
  $input = str_replace('%%%%TAB%%%%%%',     "\t", $input);
  $input = str_replace('%%%%NEWLINE%%%%%%', "\n", $input);

 }

Not elegant, but it works.

Byron Whitlock
+1  A: 

Try something like this:

function stripNonPrintable($input) {
   $bad=array(
      '\x00\x08\x0B\x0C\x0E-\x1F'
   );
   $fixed=array(
      ''
   );
   return str_replace($bad, $fixed, $input);
}
Phill Pafford
Simple, but effective. Thanks.
iddqd