views:

196

answers:

2

I want to replace word groups by links.

The word groups are defined in a multi-dimensional array. There will be thousands of terms to be replaced, so an unindexed, light-weight and multi-dimensional array is needed.

Nothing should be replaced when the term is followed by brackets or inside square brackets.

Problem: The regex itself works fine, but the replacement breaks when the word groups include regex syntax characters like + ? / ( etc. So I need to mask them. I tried all variations I can think of but it won't work for all cases. I can't mask them in $text or $s.

<?php

$text = "<html><body><pre>
Replace all foo / bar / baz cases here:
Case 1: Text Foo text.
Case 2: Text 'Foo' Bar text Foo.
Case 3: Text Foobar (2) text.
Case 4: Text Bar & Baz.
Case 5: Text Bar Baz?
Case 6: Text Bar? & Baz?
Case 7: Text Bar-X.

Replace nothing here (text followed by brackets) or [inside square brackets]: 
Case 1: Text Foo (text).
Case 2: Text 'Foo' Bar (text) Foo (text).
Case 3: Text Foobar (2) (text).
Case 4: Text Bar & Baz (text).
Case 5: Text Bar Baz (text).
Case 6: Text Bar? & Baz (text).
Case 7: Text Bar-X (text).
Case 8: [Text Foo]
</pre></body></html>";

$s = array(
  array("t" => "Foo",         "u" => "http://www.foo.net"),
  array("t" => "'Foo' Bar",   "u" => "http://www.foo.net"),
  array("t" => "Foobar (2)",  "u" => "http://www.foo.net"),
  array("t" => "Bar & Baz",   "u" => "http://www.foo.net"),
  array("t" => "Bar Baz?",    "u" => "http://www.foo.net"),
  array("t" => "Bar? & Baz?", "u" => "http://www.foo.net"),
  array("t" => "Bar-X",       "u" => "http://www.foo.net")
 );

$replaced = $text;
foreach ($s as $i => $row) {
# $replaced = preg_replace('/(?='.preg_quote($row["t"]).'[^\]][^(]+$)\b'.preg_quote($row["t"]).'\b/mS',
# $replaced = preg_replace('/(?='.preg_quote($row["t"], '/').'[^\]][^(]+$)\b'.preg_quote($row["t"], '/').'\b/mS',
# $replaced = preg_replace('/(?=\Q'.$row["t"].'\E[^\]][^(]+$)\b\Q'.$row["t"].'\E\b/mS',
    $replaced = preg_replace('/(?='.$row["t"].'[^\]][^(])\b'.$row["t"].'\b/mS',
                           '<a href="'.$row["u"].'">'.$row["t"].'</a>',
                           $replaced);
 }
echo $replaced;

?>
A: 

Im not entirely sure what you are trying to do but i saw "breaks when the word groups include regex syntax characters" which makes me think that all you need to do is escape these characters... ie put a \ before them.

EDIT:

Im getting pretty stuck with this as well, but if if show you what I've got, maybe it could help you out:

<?php

$text = "<html><body><pre>
Replace all foo / bar / baz cases here:
Case 1: Text Foo text.
Case 2: Text 'Foo' Bar text Foo.
Case 3: Text Foobar (2) text.
Case 4: Text Bar & Baz.
Case 5: Text Bar Baz?
Case 6: Text Bar? & Baz?
Case 7: Text Bar-X.

Replace nothing here (text followed by brackets) or [inside square brackets]: 
Case 1: Text Foo (text).
Case 2: Text 'Foo' Bar (text) Foo (text).
Case 3: Text Foobar (2) (text).
Case 4: Text Bar & Baz (text).
Case 5: Text Bar Baz (text).
Case 6: Text Bar? & Baz (text).
Case 7: Text Bar-X (text).
Case 8: [Text Foo]
</pre></body></html>";

function convertRegexChars($string)
{
    $converted = str_replace("?","&#63;",$string);
    $converted = str_replace(".","&#46;",$converted);
    $converted = str_replace("*","&#42;",$converted);
    $converted = str_replace("+","&#43;",$converted);
    return $converted;
}

$s = array(
  array("t" => "Foo",         "u" => "http://www.foo.net"),
  array("t" => "'Foo' Bar",   "u" => "http://www.foo.net"),
  array("t" => "Foobar (2)",  "u" => "http://www.foo.net"),
  array("t" => "Bar & Baz",   "u" => "http://www.foo.net"),
  array("t" => "Bar Baz?",    "u" => "http://www.foo.net"),
  array("t" => "Bar? & Baz?", "u" => "http://www.foo.net"),
  array("t" => "Bar-X",       "u" => "http://www.foo.net")
 );

$replaced = convertRegexChars($text);
foreach ($s as $i => $row) {
    $txt = convertRegexChars($row['t']);
    $replaced = preg_replace('/(?='.$txt.'[^\]][^(])\b'.$txt.'\b/mS',
                           '<a href="'.$row["u"].'">'.$txt.'</a>',
                           $replaced);
 }
echo $replaced;

?>
Chief17
Thanks for the fast reply. I thought, that's what preg_quote() and \Q \E are there for. I can't replace them in $text and don't want to in $s.So far I tried '.preg_quote($row["t"]).', '.preg_quote($row["t"], '/').' and \Q'.$row["t"].'\E but the result is never as desired.
Martin
Chief17
Martin
Check my **Edit**
Chief17
Unfortunatly that's not very useful for me. I need the regex because it shouldn't apply when the term is followed by brackets or inside square brackets.Also, there will be thousands of terms to be replaced, so an unindexed, light-weight and multi-dimensional array as I set up on top is needed. The regex itself works fine, it just doesn't work on word groups using regex characters like + ? / ( etc. So I need to mask them. I tried all variations I can think of but it won't work for all cases. And as mentioned I can't mask them in $text or $s.
Martin
Ahh ok, i see what your trying to do now :oPIve just run your script and it seems to work fine as far as i can see. Which particular array (`array("t" => "Bar-X", "u" => "http://www.foo.net")`) isnt doing as you expected?(just read your edit above, gimme a min and ill see)
Chief17
Martin
I see your problem now. Tricky one, let me think about it for a bit and ill get back to you in a min (to code format something in a reply enclose code in ` the key before the number 1 along the top)
Chief17
Thanks, no hurry. I'm struggling with this for 2 days already ;)
Martin
Made another **EDIT**, it doesn't work, but maybe a different approach might help
Chief17
I can't believe it's that hard. Do we oversee some major thing?
Martin
Yeah looks like we did, serg555 seemes to have got it.
Chief17
I've overseen his reply. Thanks to you too :)
Martin
+1  A: 

This should work, at least at provided test cases:

$replaced = preg_replace('/([.,\s!^]+)('.preg_quote($row["t"],'/').')([.,\s!$]+)(?!\()/mS',
                           '$1<a href="'.$row["u"].'">$2</a>$3',
                           $replaced);

\b doesn't work as expected when your match itself is wrapped inside some boundaries (like in Foobar (2)), so you should specifically provide a list of allowed characters. I quickly put [.,\s!^] and [.,\s!$] there, you probably will have to add some more allowed characters according to your specs (like -, _?)

serg
Nice one! you have an extra `)` where you are closing `preg_quote` btw
Chief17
You are right, thanks.
serg
Good job, I was running out of ideas :oP
Chief17
Overseen this answer. Thanks so much!
Martin