views:

63

answers:

3

Hello. I would like to replace strings that starts with "[id", has a middle part unknown and ends in "]" in a $text. I know how to replace strings that starts with "[id" and ends with "]" but I can't figure out how to include the unknown middle body part as a rule of replace.

Any ideas how to replace like this ?

Thanks.

+3  A: 

The following will remove all occurences of [idsomething]. something will match all characters except for ].

$newText = preg_replace('#\[id[^\]]+\]#', '', $subject);

If you know that something is always a digit, you could use something like this:

$newText = preg_replace('#\[id\d+\]#', '', $subject);

For more information about regular expressions, see this website: http://www.regular-expressions.info/.

Lekensteyn
+3  A: 

The string replace functions can only work on specific strings. If you have a pattern you want to match, you should use preg_replace, which replaces based on regular expressions:

$text = preg_replace('/\[id[^\]]*\]/', $replacement, $text)
// $replacement is whatever string you want to replace with

/\[id[^\]]*\]/ is a regular expression (aka regex). The slashes on each end are delimiters which PHP requires to delineate a regex. The rest of the pattern can be described as follows:

\[     # match a literal [ character
id     # match the string "id"
[^     # open a negated character class
  \]   # match anything other than literal ] character (since it's in a negated class)
]*     # close the class, repeat it zero or more times
\]     # match a literal ]

Concepts:

  • Character classes - a character class is a way of describing that a character can be one of a series of possibilities. Character classes start with a [ and end with a ]. For example, [abc] matches a or b or c. Character classes can be negated if the first character within a class is ^: [^abc] matches any character that isn't a or b or c. In our pattern, [^\]] matches any character that isn't ]. Note that the ] within the class has to be escaped because ] generally means the end of the class but we want to specify a literal ] character.

  • Repetition using * - Parts of patterns can be repeated (which allows for a pattern to specify that something can appear multiple times). There are three repetition operators: ? specifies that something may appear zero or one times (ie. it makes part of your pattern optional); * specifies that something may appear zero or more times (ie. it can be optional, but it could also any number of times); + specifies something that must appear at least once.
    In our case; [^\]]* specifies that a character that is not ] can be matched zero or more times - this will match an empty string, or will match abcdefg, as the negated character class matches 7 times (as each character is not ]).
    Note that by default, regexes are greedy, which means that they will match as much of the string as possible; for this reason [^\]]* when matched against abcdefg will match the entire string, as that is the largest match it can make (even though smaller substrings match the pattern).

  • Everything else in this pattern matches literally. As we saw above, [ and ] need to be escaped to match the literal characters - because they have meaning within a regex (ie. to define a character class) - but id matches an i followed immediately by a d.

When you put that all together, you end up with a pattern that matches a opening bracket, followed by the letters id, followed by zero or more characters and then a closing bracket.

Note if you wanted to make this pattern case-insensitive, you could add an i after the final slash: /\[id[^\]]*\]/i. /i is a modifier which makes the entire pattern case insensitive (so it'd match [ID=...] as well).

I recommend reading through the tutorial on regular-expressions.info if you are not familiar with regexes, as it will give you a very good understanding of what they do and how to compose them.

Daniel Vandersluis
+1  A: 

using preg_replace():

<?php

    $text = "[hi=hello] [id=hellomynameisjoe] [hello=hi]";
    $new = preg_replace('@\[id[^\]]+\]@', '[replaced!]', $text);
    echo $new;
?>
Ruel
I like this syntax. Very readable.
Atømix
Note that lazy patterns are less efficient as they cause a lot of backtracking. See http://blog.stevenlevithan.com/archives/greedy-lazy-performance
Daniel Vandersluis
I wonder why I'm so used to that. Thanks for the info, editing my code.
Ruel
Edited. thanks.
Ruel