tags:

views:

79

answers:

2

I've got text from which I want to remove all characters that ARE NOT the following.

desired_characters =

0123456789!&',-./abcdefghijklmnopqrstuvwxyz\n

The last is a \n (newline) that I do want to keep.

A: 
$string = 'This is anexample $tring! :)';
$string = preg_replace('/[^0-9!&\',\-.\/a-z\n]/', '', $string);

echo $string; // hisisanexampletring!

^ This is case sensitive, hence the capital T is removed from the string. To allow capital letters as well, $string = preg_replace('/[^0-9!&\',\-.\/A-Za-z\n]/', '', $string)

chigley
The dash in a character class needs to be the first or last character, or be escaped. Else it will indicate a range.
kemp
@kemp - don't know why I missed that! Have edited, thanks for pointing it out :)
chigley
@kemp: That’s not quite right. There are several other cases in which escaping the range indicator is not necessary as it’s not part of a character range: `[-]` (only `-`), `[a-b-c]` (`a`–`b`, `-`, `c`), or `[\d-.]` (`\d`, `-`, `.`).
Gumbo
Ah yes Gumbo, you're right
kemp
+4  A: 

To match all characters except the listed ones, use an inverted character set [^…]:

$chars = "0123456789!&',-./abcdefghijklmnopqrstuvwxyz\n";
$pattern = "/[^".preg_quote($chars, "/")."]/";

Here preg_quote is used to escape certain special characters so that they are interpreted as literal characters.

You could also use character ranges to express the listed characters:

$pattern = "/[^0-9!&',-.\\/a-z\n]/";

In this case it doesn’t matter if the literal - in ,-. is escaped or not. Because ,-. is interpreted as character range from , (0x2C) to . (0x2E) that already contains the - (0x2D) in between.

Then you can remove those characters that are matched with preg_replace:

$output = preg_replace($pattern, "", $str);
Gumbo
works perfectly - thanks for the explanations as well
matt_tm
@matt_tm: You’re welcome!
Gumbo