views:

786

answers:

4

I have the following:

$reg[0] = '`<a(\s[^>]*)href="([^"]*)"([^>]*)>`si';
$reg[1] = '`<a(\s[^>]*)href="([^"]*)"([^>]*)>`si';
$replace[0] = '<a$1href="http://www.yahoo.com"$3&gt;';
$replace[1] = '<a$1href="http://www.live.com"$3&gt;';
$string = 'Test <a href="http://www.google.com"&gt;Google!!&lt;/a&gt;Test <a href="http://www.google.com"&gt;Google!!2&lt;/a&gt;Test';
echo preg_replace($reg, $replace, $string);

Which results in:

Test <a href="http://www.live.com"&gt;Google!!&lt;/a&gt;Test <a href="http://www.live.com"&gt;Google!!2&lt;/a&gt;Test

I'm looking to end up with (the difference being in the first link):

Test <a href="http://www.yahoo.com"&gt;Google!!&lt;/a&gt;Test <a href="http://www.live.com"&gt;Google!!2&lt;/a&gt;Test

The idea is to replace each URL within a link within a string with a unique other URL. It's for a newsletter system where I want to track what people have clicked on, so the URL will be a "fake" URL which they will be redirected to the real URL after the click is recorded.

+2  A: 

The problem is that your first replace string is going to be matched by the second search pattern, effectively overwriting the first replace string with the second replace string.

Unless you can somehow differentiate "modified" links from the original ones so that they won't get caught by the other expression (perhaps by adding an extra HTML property?), I don't think you can really solve this with a single preg_replace() call. One possible solution (aside from the differentiation in the regular expression) that comes to mind would be to use preg_match_all(), since it will give you an array of matches to work with. You could probably then encode the matched URLs with your tracking URL by iterating over the array and running a str_replace() on each matched URL.

htw
How would you use preg_match to do the replacing?
Darryl Hein
Sorry, I forgot about that when I was writing my post initially—I edited my post to add a potential way of using preg_match() to achieve what you wanted. Hope it helps.
htw
Also, I accidentally said preg_match() when I actually meant preg_match_all()—sorry about that, it's been a while since I've used these functions.
htw
+1  A: 

I'm not good with regexps, but if what you're doing is just replacing external URLs (i.e. not part of your site/application) with an internal URL that will track click-thrus and redirect the user, then it should be easy to construct a regexp that will match only external URLs.

So let's say your domain is foo.com, then you just need to create a regexp that will only match a hyperlink that doesn't contain a URL starting with http://foo.com. Now, as I said, I'm pretty bad with regexps, but here's my best stab at it:

$reg[0] = '`<a(\s[^>]*)href="(?!http://foo.com)([^"]*)"([^&gt;]*)&gt;`si';

Edit: If you want to track click-thrus to internal URLs as well, then just replace http://foo.com with the URL of your redirect/tracking page, e.g. http://foo.com/out.php.

I'll walk through an example scenario just to show what I'm talking about. Let's say you have the below newsletter:

<h1>Newsletter Name</h1>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec lobortis,
ligula <a href="http://bar.com"&gt;sed sollicitudin</a> dignissim, lacus dolor
suscipit sapien, <a href="http://foo.com"&gt;eget auctor</a> ipsum ligula
non tortor. Quisque sagittis sodales elit. Mauris dictum blandit lacus.
Mauris consequat <a href="http://last.fm"&gt;laoreet lacus</a>.</p>

For the purpose of this exercise, the search pattern will be:

// Only match links that don't begin with: http://foo.com/out.php
`<a(\s[^>]*)href="(?!http://foo.com/out\.php)([^"]*)"([^&gt;]*)&gt;`si

This regexp can be broken down into 3 parts:

  1. <a(\s[^>]*)href="
  2. (?!http://foo.com/out\.php)([^"]*)
  3. "([^>]*)>

On the first pass of the search, the script will examine:

<a href="http://bar.com"&gt;

This link satisfies all 3 components of the regexp, so the URL is stored in the database and is replaced with http://foo.com/out.php?id=1.

On the second pass of the search, the script will examine:

<a href="http://foo.com/out.php?id=1"&gt;

This link matches 1 and 3, but not 2. So the search will move on to the next link:

<a href="http://foo.com"&gt;

This link satisfies all 3 components of the regexp, so it the URL is stored in the database and is replaced with http://foo.com/out.php?id=2.

On the 3rd pass of the search, the script will examine the first 2 (already replaced) links, skip them, and then find a match with the last link in the newsletter.

Calvin
Internal or external doesn't really matter to me. I want to replace all links to track all clicks.
Darryl Hein
In that case you just need to replace http://foo.com with the exact address of the redirect/tracking page.
Calvin
That still doesn't work if you have 1 url going to www.google.com and another going to cnn.com. Each link needs to be replaced by a unique other link.
Darryl Hein
This basically what I have done, but it doesn't work, the problem being that PHP has not way of only replacing the first time the reg exp is found--at least that I know of. It will instead replace all of the found strings.
Darryl Hein
I can't explain it any clearer than my last edit. Note the difference between the regexp I'm using and the one you're using. It is not the same as what you're doing. This pattern allows replaced links and unreplaced links to be differentiated.
Calvin
+1  A: 

I do not know, if I'd understood it right. But I'd written following snippet: The regex matches some hyperlinks. Then it loops thru the result and compares the text nodes against the hyperlink references. When a text node is found in a hyperlink reference, then it extends the matches by inserting a trackback sample link with a unique key.

UPDATE The snippets finds all hyperlinks:

  1. find links
  2. build track back link
  3. find position of each found link (matches[3]) and set a template tag
  4. replace templatetags by trackback links Each link position is unique.

$string = '<h1>Newsletter Name</h1> <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec lobortis, ligula <a href="http://bar.com">sed sollicitudin</a> dignissim, lacus dolor suscipit sapien, <a href="http://foo.com">bar.com&lt;/a&gt; ipsum ligula non tortor. Quisque sagittis sodales elit. Mauris dictum blandit lacus. Mauris consequat <a href="http://last.fm">laoreet lacus</a>.</p> <h1>Newsletter Name</h1> <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec lobortis, ligula <a href="http://bar.com">sed sollicitudin</a> dignissim, lacus dolor suscipit sapien, <a href="http://foo.com">bar.com&lt;/a&gt; ipsum ligula non tortor. Quisque sagittis sodales elit. Mauris dictum blandit lacus. Mauris consequat <a href="http://last.fm">laoreet lacus</a>.</p> <h1>Newsletter Name</h1> <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec lobortis, ligula <a href="http://bar.com">sed sollicitudin</a> dignissim, lacus dolor suscipit sapien, <a href="http://foo.com">bar.com&lt;/a&gt; ipsum ligula non tortor. Quisque sagittis sodales elit. Mauris dictum blandit lacus. Mauris consequat <a href="http://last.fm">laoreet lacus</a>.</p> ';

$regex = '<[^>]+>(.*)<\/[^>]+>';
preg_match_all("'<a\s+href=\"(.*)\"\s*>(.*)<\/[^>]+>'U",$string,$matches);


$uniqueURL = 'http://www.yourdomain.com/trackback.php?id=';

foreach($matches[2] as $k2 => $m2){
    foreach($matches[1] as $k1 => $m1){
        if(stristr($m1, $m2)){
                $uniq = $uniqueURL.md5($matches[0][$k2])."_".rand(1000,9999);
                $matches[3][$k1] = $uniq."&refLink=".$m1;
        }
    }
}


foreach($matches[3] as $key => $val) {

    $startAt = strpos($string, $matches[1][$key]);
    $endAt= $startAt + strlen($matches[1][$key]);

    $strBefore = substr($string,0, $startAt);
    $strAfter = substr($string,$endAt);

    $string = $strBefore . "@@@$key@@@" .$strAfter;

}
foreach($matches[3] as $key => $val) {
     $string = str_replace("@@@$key@@@",$matches[3][$key] ,$string);
}
print "<pre>";
echo $string;
Tom Schaefer
That works till you have 2 links that go to the same place place (bar.com) within the same piece of text and yet you want unique urls for each link. Your array will contain unique urls, but how do you replace them within the string?
Darryl Hein
A: 

Until PHP 5.3 where you can just create a function on the spot, you have to use either create_function (which I hate) or a helper class.

/**
 * For retrieving a new string from a list.
 */
class StringRotation {
    var $i = -1;
    var $strings = array();

    function addString($string) {
     $this->strings[] = $string;
    }

    /**
     * Use sprintf to produce result string
     * Rotates forward
     * @param array $params the string params to insert
     * @return string
     * @uses StringRotation::getNext()
     */
    function parseString($params) {
     $string = $this->getNext();
     array_unshift($params, $string);
     return call_user_func_array('sprintf', $params);
    }

    function getNext() {
     $this->i++;
     $t = count($this->strings);
     if ($this->i > $t) {
      $this->i = 0;
     }
     return $this->strings[$this->i];
    }

    function resetPointer() {
     $this->i = -1;
    }
}

$reg = '`<a(\s[^>]*)href="([^"]*)"([^>]*)>`si';
$replaceLinks[0] = '<a%2$shref="http://www.yahoo.com"%4$s&gt;';
$replaceLinks[1] = '<a%2$shref="http://www.live.com"%4$s&gt;';

$string = 'Test <a href="http://www.google.com"&gt;Google!!&lt;/a&gt;Test <a href="http://www.google.com"&gt;Google!!2&lt;/a&gt;Test';

$linkReplace = new StringRotation();
foreach ($replaceLinks as $replaceLink) {
    $linkReplace->addString($replaceLink);
}

echo preg_replace_callback($reg, array($linkReplace, 'parseString'), $string);
OIS