views:

1646

answers:

4

Hi all,

seems trivial but give me a hard time:

Does anyone have a hint for me on how to remove control characters like STX from a php string. I played around with

preg_replace("/[^a-zA-Z0-9 .\-_;!:?äÄöÖüÜß<>='\"]/","",$pString)

but found that it removed way to much. Is there a way to remove only controll chars?

tia

K

+2  A: 

PHP does support POSIX-Classes so you can use [:cntrl:] instead of some fancy character-magic-stuff:

ereg_replace("[:cntrl:]", "", $pString);
Bobby
PHP does support POSIX, using the ereg functions istead of preg: http://nl2.php.net/manual/en/book.regex.php
Duroth
Tested this one, POSIX-Classes seem no to work. Thx for the hint anyways!
KB22
I have to correct myself, to me more precise: ereg works in deed.
KB22
@Duroth, thanks for testing and the info!
Bobby
+5  A: 

If you mean by control characters the first 32 ascii characters and \x7F (that includes the carriage return, etc!), then this will work:

preg_replace('/[\x00-\x1F\x7F]/', '', $input);

(Note the single quotes: with double quotes the use of \x00 causes a parse error, somehow.)

The line feed and carriage return (often written \r and \n) may be saved from removal like so:

preg_replace('/[\x00-\x09\x0B\x0C\x0E-\x1F\x7F]/', '', $input);

I must say that I think Bobby's answer is better, in the sense that [:cntrl:] better conveys what the code does than [\x00-\x1F\x7F]. So, using ereg_replace instead of preg_replace:

ereg_replace('[:cntrl:]', '', $input);
Stephan202
THX Stephan, this one worked out for me.
KB22
Thanks KB22. Note that my regex was incorrect when you accepted my answer. Please see the updated version.
Stephan202
sadly ereg_replace is deprecated in PHP 5.3 and the mb version is slower than preg_replace. There is a slightly cleaner way to do this with preg_replace, and in my testing it is very slightly faster (1% faster when dealing with hundreds of thousands of items) than the one above:preg_replace('/[\p{Cc}]/', '', $input);
Jay Paroline
+1  A: 

regex free method

If you are only zapping the control characters I'm familiar with (those under 32 and 127), try this out:

 for($control = 0; $control < 32; $control++) {
     $pString = str_replace(chr($control), "", $pString;
 }

$pString = str_replace(chr(127), "", $pString;

The loop gets rid of all but DEL, which we just add to the end.

I'm thinking this will be a lot less stressful on you and the script then dealing with regex and the regex library.

Anthony
How is this less "stressful" than ereg_replace("[:cntrl:]", "", $pString); ? Using ereg, the PHP interpreter will probably compile more efficient intermediate code than it would using that for loop anyway.
ithcy
A: 

its less stressful because ereg and ereg_replace are now depreceated

http://php.net/manual/en/function.ereg-replace.php

HdotNET