views:

736

answers:

6

With PHP how can I mimic the auto-link behavior of StackOverflow (which BTW is awesomely cool)?

For instance, the following URL:

http://www.stackoverflow.com/questions/1925455/how-to-mimic-stackoverflow-auto-link-behavior

Is converted into this:

<a title="how to mimic stackoverflow auto link behavior" rel="nofollow" href="http://www.stackoverflow.com/questions/1925455/how-to-mimic-stackoverflow-auto-link-behavior"&gt;stackoverflow.com/questions/1925455/…&lt;/a&gt;

I don't really care for the title attribute in this case.


And this:

http://pt.php.net/manual/en/function.base-convert.php#52450

Is converted into this:

<a rel="nofollow" href="http://pt.php.net/manual/en/function.base-convert.php#52450"&gt;pt.php.net/manual/en/…&lt;/a&gt;

How can I make a similar function in PHP?

PS: Check my comments on this question for some more examples and behaviors.

+3  A: 

If you have a predictable URL like SO then it should be easy to grab links with a regex and filter out the ones that match the pattern. So if your URL is http://example.com/stuff/1234 then finding http://example.com/stuff/1234/how-to-mimic would be pretty trivial with a regex.

<?php
preg_match('/http:\/\/example.com\/(\w*)\/(\d)[\/*]/', $text, $matches);

if (is_array($matches))
{
  foreach ($matches as $match)
  {
    // do something...
  }
}
?>
Darrell Brogdon
Take `http://pt.php.net/manual/en/function.base-convert.php#52450` for instance. Check the comment on my question for the output.
Alix Axel
A: 

http://snipplr.com/view/2371/regex-regular-expression-to-match-a-url/

CodeJoust
That is a simple auto-linker, it does not mimic SO behavior.
Alix Axel
+4  A: 

This will convert the sample string to what you are after. I left out title as that comes from a different source than just a standalone URL and you said that was not important.

<?php
$urlInput="http://www.stackoverflow.com/questions/1925455/how-to-mimic-stackoverflow-auto-link-behavior";
preg_match('@http://(?:www\.)?(\S+/)\S*(?:\s|$)@i', $urlInput, $matches);
print('<a rel="nofollow" href="' . trim($matches[0]) . '">' . $matches[1] . '...</a>');
?>

Extend as needed to scan through your text.

If you want to match just a certain number of URL path elements, use this RE:

'@http://(?:www\.)?((?:\S+?/){1,3})\S*(?:\s|$)@i'

This will extract out up to 3 path elements (the host and up to two directories). You can vary the upper bound in {1,3} to define the maximum number of path elements you want.

Changed the ending \S to allow for zero matches.

Kevin Brock
+1, WOW, where did the magic come from? I was like "this isn't going to work" but surprisingly it almost did! I can't process Regex at this time but I'll try to understand it tomorrow.
Alix Axel
Also, I said almost because it fails for the following URLs: `http://a.b/c/d/e/f/test` and `http://a.b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v/z/y/w/z/test`
Alix Axel
http://a.b/c/d/e/f/test and http://a.b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v/z/y/w/z/test
Alix Axel
Also for URLs like `http://www.stackoverflow.com/` it fails: "Notice: Undefined offset: 0 in I:\WWW\index.php on line 35 Notice: Undefined offset: 1 in I:\WWW\index.php on line 35 ..."
Alix Axel
Was adding the bounded check while the comments were being typed. This should work now for the longer URls.
Kevin Brock
Ok, this will now work if the URL just has the host name and trailing slash. However, it is much harder to make this work if there is no trailing slash.
Kevin Brock
Looks like you're showing `...` even when the URL is very short.
philfreo
+1  A: 

Based somewhat on Kevin Brock's answer, but allows configurable params (folder depth & URL length), and accepts URLs without trailing slashes:

$url = 'http://www.stackoverflow.com/questions/1925455/how-to-mimic-stackoverflow-auto-link-behavior';
$output = '';
$params = array (
    'length' => 10,
    'depth' => 2,
);
preg_match ('@http://(?:www\.)?([^/?# ]+)(/\S+)?(?=\s|$)@i', $url, $matches);
if (isset ($matches[2]))
{
    $parts = explode('/', substr($matches[2], 1));
    if (count($parts) > $params['depth'] && strlen($matches[1].$matches[2]) > $params['length'])
        $output = $matches[1].'/'.implode('/', array_slice($parts, 0, 2)).'/...';
    else
        $output = $matches[1].$matches[2];
}
else
    $output = $matches[1];

echo '<a href="'.$matches[0].'">'.$output.'</a>';

Hope this helps

K Prime
This answer seems to be the most flexible so far however it's pretty difficult to use it to replace URLs in free text since it does not use `preg_replace()`.
Alix Axel
You could convert it to a function, and use that as callback to `preg_replace`
K Prime
+11  A: 

Try this out. The pattern is from daringfireball.net

/**
 * Replace links in text with html links
 *
 * @param  string $text
 * @return string
 */
function auto_link_text($text)
{
   $pattern  = '#\b(([\w-]+://?|www[.])[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))#';
   $callback = create_function('$matches', '
       $url       = array_shift($matches);
       $url_parts = parse_url($url);

       $text = parse_url($url, PHP_URL_HOST) . parse_url($url, PHP_URL_PATH);
       $text = preg_replace("/^www./", "", $text);

       $last = -(strlen(strrchr($text, "/"))) + 1;
       if ($last < 0) {
           $text = substr($text, 0, $last) . "&hellip;";
       }

       return sprintf(\'<a rel="nofollow" href="%s">%s</a>\', $url, $text);
   ');

   return preg_replace_callback($pattern, $callback, $text);
}

Input Text:

This is my text.  I wonder if you know about asking questions on StackOverflow:
 Check This out http://www.stackoverflow.com/questions/1925455/how-to-mimic-stackoverflow-auto-link-behavior

 Also, base_convert php function?
http://pt.php.net/manual/en/function.base-convert.php#52450

http://pt.php.net/manual/en/function.base-convert.php?wtf=hehe#52450

Output Text:

This is my text.  I wonder if you know about asking questions on StackOverflow:
 Check This out <a rel="nofollow" href="http://www.stackoverflow.com/questions/1925455/how-to-mimic-stackoverflow-auto-link-behavior"&gt;stackoverflow.com/questions/1925455/&amp;hellip;&lt;/a&gt;

 Also, base_convert php function?
<a rel="nofollow" href="http://pt.php.net/manual/en/function.base-convert.php#52450"&gt;pt.php.net/manual/en/&amp;hellip;&lt;/a&gt;

<a rel="nofollow" href="http://pt.php.net/manual/en/function.base-convert.php?wtf=hehe#52450"&gt;pt.php.net/manual/en/&amp;hellip;&lt;/a&gt;
Eric Coleman
+2  A: 

This is based on the same daringfireball.net regular expression, but adds a bit more logic than Eric Coleman's example, as well as configuration for maximum URL depth (SO seems to be 50), maximum path depth when URL is truncated (SO seems to be 2), and ellipsis character (&hellip;).

As far as I know this replicates all of the SO URL rewriting functionality, at least as far as what was discussed so far in the comments and responses here.

function auto_link_text($text) {
    $pattern  = '#\b(([\w-]+://?|www[.])[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))#';
    return preg_replace_callback($pattern, 'auto_link_text_callback', $text);
}

function auto_link_text_callback($matches) {
    $max_url_length = 50;
    $max_depth_if_over_length = 2;
    $ellipsis = '&hellip;';

    $url_full = $matches[0];
    $url_short = '';

    if (strlen($url_full) > $max_url_length) {
        $parts = parse_url($url_full);
        $url_short = $parts['scheme'] . '://' . preg_replace('/^www\./', '', $parts['host']) . '/';

        $path_components = explode('/', trim($parts['path'], '/'));
        foreach ($path_components as $dir) {
            $url_string_components[] = $dir . '/';
        }

        if (!empty($parts['query'])) {
            $url_string_components[] = '?' . $parts['query'];
        }

        if (!empty($parts['fragment'])) {
            $url_string_components[] = '#' . $parts['fragment'];
        }

        for ($k = 0; $k < count($url_string_components); $k++) {
            $curr_component = $url_string_components[$k];
            if ($k >= $max_depth_if_over_length || strlen($url_short) + strlen($curr_component) > $max_url_length) {
                if ($k == 0 && strlen($url_short) < $max_url_length) {
                    // Always show a portion of first directory
                    $url_short .= substr($curr_component, 0, $max_url_length - strlen($url_short));
                }
                $url_short .= $ellipsis;
                break;
            }
            $url_short .= $curr_component;
        }

    } else {
        $url_short = $url_full;
    }

    return "<a rel=\"nofollow\" href=\"$url_full\">$url_short</a>";
}

Sample Input:

This is my text.  I wonder if you know about asking questions on StackOverflow:
Check This out http://www.stackoverflow.com/questions/1925455/how-to-mimic-stackoverflow-auto-link-behavior

Also, base_convert php function?
http://pt.php.net/manual/en/function.base-convert.php#52450

http://pt.php.net/manual/en/function.base-convert.php?wtf=hehe#52450

http://a.b/c/d/e/f/test

and http://a.b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v/z/y/w/z/test

Sample Output:

This is my text.  I wonder if you know about asking questions on StackOverflow:
Check This out <a rel="nofollow" href="http://www.stackoverflow.com/questions/1925455/how-to-mimic-stackoverflow-auto-link-behavior"&gt;http://stackoverflow.com/questions/1925455/&amp;hellip;&lt;/a&gt; 

Also, base_convert php function?
<a rel="nofollow" href="http://pt.php.net/manual/en/function.base-convert.php#52450"&gt;http://pt.php.net/manual/en/&amp;hellip;&lt;/a&gt; 

<a rel="nofollow" href="http://pt.php.net/manual/en/function.base-convert.php?wtf=hehe#52450"&gt;http://pt.php.net/manual/en/&amp;hellip;&lt;/a&gt; 

<a rel="nofollow" href="http://a.b/c/d/e/f/test"&gt;http://a.b/c/d/e/f/test&lt;/a&gt; 

and <a rel="nofollow" href="http://a.b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v/z/y/w/z/test"&gt;http://a.b/c/d/&amp;hellip;&lt;/a&gt;
pix0r
+1, Indeed I was also testing the 50 length thingy and your answer is the most complete one to this question, I wish I had seen it before the bounty expired.
Alix Axel